HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
541.
▲
Kubernetes-native distributed LLM inference framework
github.com/llm-d
discuss
a year ago
baijum
2 points
542.
▲
Show HN: Contextual AI Document Parser – Infer hierarchy for long, complex docs
discuss
a year ago
ishan_sinha
2 points
543.
▲
Lambda calculus - compiler, type inference, and evaluator in less than 100 LOC
gist.github.com
discuss
a year ago
tearflake
2 points
544.
▲
Protobuf-ts-types: zero-codegen TypeScript type inference from protobuf messages
github.com/nathanhleung
discuss
a year ago
18nleung
2 points
545.
▲
Eagle-3 Speculative Decoding for LLM Inference (5.6x speedup)
github.com/SafeAILab
discuss
a year ago
summarity
2 points
546.
▲
Show HN: Kernel-level LLM inference via /dev/llm0
github.com/randombk
discuss
a year ago
RandomBK
2 points
547.
▲
Rust Type Inference Broke with Update to Deranged Crate
github.com/jhpratt
discuss
a year ago
nethunters
2 points
548.
▲
DeepDive: In-Depth Decryption of LLMs Construction and Inference from Scratch
github.com/therealoliver
discuss
a year ago
therealoliver
2 points
549.
▲
Show HN: OptiLLMBench – Test how inference optimization tricks scale up LLMs
discuss
a year ago
codelion
2 points
550.
▲
Deepseek.cpp: CPU inference for the DeepSeek family of LLMs in pure C++
github.com/andrewkchan
discuss
a year ago
hedgehog0
2 points
551.
▲
Jlama: LLM Inference Engine for Java
github.com/tjake
discuss
a year ago
saikatsg
2 points
552.
▲
Show HN: EmbedAnything – Rust Powered Inference, Ingestion and Indexing
github.com/StarlightSearch
discuss
2 years ago
Sonam_AI
2 points
553.
▲
JetStream: Throughput+memory optimized engine for LLM inference on XLA devices
github.com/google
discuss
2 years ago
lnyan
2 points
554.
▲
Duck-Lisp: optional free-form parenthesis inference
github.com/oitzujoey
discuss
2 years ago
nemoniac
2 points
555.
▲
Jlama – a modern LLM inference engine for Java
github.com/tjake
discuss
2 years ago
simonpure
2 points
556.
▲
A minimal Python implementation of Hindley-Milner type inference
github.com/ethe
discuss
2 years ago
ethegwo
2 points
557.
▲
Cake: a Rust framework for distributed inference of large models like LLama3
github.com/evilsocket
discuss
2 years ago
mnoorfawi
2 points
558.
▲
Instant ONNX export for ML inference
github.com/Quantco
discuss
2 years ago
agoel4512
2 points
559.
▲
Show HN: Model Gateway – bridging your apps with LLM inference endpoints
github.com/modelgw
discuss
2 years ago
projectstarter
2 points
560.
▲
Llama3 Inference in Pure Java
github.com/mukel
discuss
2 years ago
mikepapadim
2 points
561.
▲
llm.f90: LLM Inference in Fortran
github.com/rbitr
discuss
2 years ago
tosh
2 points
562.
▲
SGLang: Fast and Expressive LLM Inference with RadixAttention for 5x Throughput
github.com/skypilot-org
discuss
2 years ago
covi
2 points
563.
▲
Inference of Mamba models in pure C
github.com/kroggen
discuss
2 years ago
kroggen
2 points
564.
▲
Mamba LLM Inference on CPU
github.com/rbitr
discuss
3 years ago
andy99
2 points
565.
▲
Official PR Reveals the Inference Code for Mixtral 8x7B
github.com/vllm-project
discuss
3 years ago
georgehill
2 points
566.
▲
Stable-fast for SD inference: Faster than AITemplate, On par with TensorRT
github.com/chengzeyi
discuss
3 years ago
chengzeyi
2 points
567.
▲
DeepSpeed-FastGen: High-Throughput for LLMs via MII and DeepSpeed-Inference
github.com/microsoft
discuss
3 years ago
CharlesW
2 points
568.
▲
Show HN: Llama2 inference in one file of pure OCaml
github.com/jackpeck
discuss
3 years ago
0c
2 points
569.
▲
Tairov/llama2.mojo: Inference Llama 2 in one file of pure
github.com/tairov
discuss
3 years ago
freediver
2 points
570.
▲
Llama2 Inference in pure Mojo
github.com/tairov
discuss
3 years ago
atairov
2 points
More