HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
211.
▲
Auto-unloading models using __init_subclass__ (Python)
github.com/Vrroom
1 comment
3 years ago
matroid
1 points
212.
▲
Bookish: math-infested markdown to HTML and latex
github.com/parrt
discuss
8 years ago
ingve
1 points
213.
▲
Show HN: Mamba-Chat – A Chat LLM Based on State Space Models
github.com/havenhq
discuss
3 years ago
justusmattern
9 points
214.
▲
Ask HN: Which cloud provider offers AMD MI250/MI300?
5 comments
2 years ago
fzysingularity
2 points
215.
▲
Show HN: Distill – Remove redundant RAG context in 12ms, no LLM calls
discuss
6 months ago
sidk24
2 points
216.
▲
Threads can infect each other with their low priority
github.com/Dobiasd
35 comments
7 years ago
Dobiasd
68 points
217.
▲
Llama2.c: Inference llama 2 in one file of pure C
github.com/karpathy
165 comments
3 years ago
anjneymidha
707 points
218.
▲
The path to open-sourcing the DeepSeek inference engine
github.com/deepseek-ai
63 comments
a year ago
Palmik
550 points
219.
▲
DeepSeek open source DeepEP – library for MoE training and Inference
github.com/deepseek-ai
71 comments
a year ago
helloericsf
536 points
220.
▲
DeepSeek 4 Flash local inference engine for Metal
github.com/antirez
159 comments
2 months ago
tamnd
499 points
221.
▲
Flux 2 Klein pure C inference
github.com/antirez
141 comments
5 months ago
antirez
453 points
222.
▲
Gemma.cpp: lightweight, standalone C++ inference engine for Gemma models
github.com/google
130 comments
2 years ago
mfiguiere
422 points
223.
▲
BitNet: Inference framework for 1-bit LLMs
github.com/microsoft
167 comments
3 months ago
redm
370 points
224.
▲
Exllamav2: Inference library for running LLMs locally on consumer-class GPUs
github.com/turboderp
125 comments
3 years ago
Palmik
322 points
225.
▲
Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model
github.com/antirez
35 comments
4 months ago
Curiositry
311 points
226.
▲
Lm.rs: Minimal CPU LLM inference in Rust with no dependency
github.com/samuel-vitorino
76 comments
2 years ago
littlestymaar
310 points
227.
▲
Web LLM – WebGPU Powered Inference of Large Language Models
github.com/mlc-ai
80 comments
3 years ago
summarity
276 points
228.
▲
Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon
github.com/RunanywhereAI
153 comments
3 months ago
sanchitmonga22
240 points
229.
▲
A general-purpose probabilistic programming system with programmable inference
github.com/probcomp
72 comments
7 years ago
espeed
238 points
230.
▲
Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon
github.com/t8
85 comments
3 months ago
tatef
221 points
231.
▲
Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
github.com/jmaczan
18 comments
24 days ago
yu3zhou4
205 points
232.
▲
Gluon – A static, type-inferred and embeddable language written in Rust
github.com/gluon-lang
94 comments
8 years ago
Lapz
203 points
233.
▲
Llama.rs – Rust port of llama.cpp for fast LLaMA inference on CPU
github.com/setzer22
24 comments
3 years ago
rrampage
202 points
234.
▲
Show HN: We made our own inference engine for Apple Silicon
github.com/trymirai
46 comments
a year ago
darkolorin
186 points
235.
▲
Microsoft BitNet: inference framework for 1-bit LLMs
github.com/microsoft
33 comments
2 years ago
galeos
173 points
236.
▲
Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework
github.com/ai-dynamo
39 comments
a year ago
ashvardanian
150 points
237.
▲
LLMLingua: Compressing Prompts for Faster Inferencing
github.com/microsoft
47 comments
3 years ago
TarqDirtyToMe
149 points
238.
▲
Show HN: Zero-codegen, no-compile TypeScript type inference from Protobufs
github.com/nathanhleung
73 comments
a year ago
18nleung
138 points
239.
▲
Gluon: A static, type inferred and embeddable language written in Rust
github.com/Marwes
48 comments
10 years ago
jswny
136 points
240.
▲
Launch HN: Cactus (YC S25) – AI inference on smartphones
github.com/cactus-compute
63 comments
9 months ago
HenryNdubuaku
123 points
More