HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
391.
▲
Show HN: Kremis – Rust graph DB; every answer is fact, inference, or unknown
github.com/TyKolt
1 comment
3 months ago
TyKolt
3 points
392.
▲
vLLM-mlx – 65 tok/s LLM inference on Mac with tool calling and prompt caching
github.com/raullenchai
1 comment
4 months ago
raullen
3 points
393.
▲
A Distributed Inference Framework Enabling Running Models Exceeding Total Memory
github.com/firstbatchxyz
1 comment
7 months ago
driaforall
3 points
394.
▲
Metaphysical Priming reduces Gemini 3.0 Pro inference latency by 60%
github.com/Cactus-mp4
1 comment
7 months ago
cactus-jpg
3 points
395.
▲
SQLite AI – Local AI Inference, Powered by SQLite
github.com/sqliteai
1 comment
a year ago
marcobambini
3 points
396.
▲
Neural Amp Modeler inference in web browsers using WebAssembly (TONE3000)
github.com/tone-3000
1 comment
a year ago
woodybury
3 points
397.
▲
Fastgen – SOTA LLM inference in 3k lines of Python
github.com/facebookresearch
1 comment
a year ago
mpu
3 points
398.
▲
Cake: Distributed LLM and StableDiffusion inference for mobile desktop or server
github.com/evilsocket
1 comment
a year ago
ethanpil
3 points
399.
▲
Geniusrise – inference APIs for text, vision, audio, multi-modal AI models
github.com
1 comment
2 years ago
ixaxaar
3 points
400.
▲
Show HN: Collider – the platform for local LLM debug and inference at warp speed
github.com/gotzmann
1 comment
3 years ago
Ambix
3 points
401.
▲
Takeoff Inference Server Is Now Open Source
github.com/titanml
1 comment
3 years ago
mezark
3 points
402.
▲
Show HN: GPT-2 inference on the CPU using C/C++
github.com/ggerganov
1 comment
4 years ago
ggerganov
3 points
403.
▲
Hummingbird,compiles trained ML models into tensor computation for inference
github.com/microsoft
1 comment
6 years ago
tourist_on_road
3 points
404.
▲
HE-Transformer: Deep Learning Inference with Homomorphic Encryption
github.com/NervanaSystems
1 comment
8 years ago
ArtWomb
3 points
405.
▲
Inferring polygon vertices with a VGG-16 model
github.com/AidanRocke
1 comment
8 years ago
aidanrocke
3 points
406.
▲
Show HN: Piqc – GPU waste scanner for LLM inference clusters
github.com/paralleliq
discuss
22 days ago
paralleliq
3 points
407.
▲
Show HN: Llmff v1.0 FFmpeg for Inference
github.com/syndicalt
discuss
24 days ago
syndicalt
3 points
408.
▲
Atlas TQ1_0 – Pure C++ Ternary (1.58-Bit) Inference Engine for CPU
github.com/xxxn3m3s1sxxx
discuss
a month ago
xxxn3m3s1sxxx
3 points
409.
▲
Atlas – Pure Rust Inference Engine
github.com/Avarok-Cybersecurity
discuss
a month ago
danborn26
3 points
410.
▲
Show HN: Valkyr LM Inference with Realtime Guarantees
github.com/Foundation42
discuss
2 months ago
quatonion
3 points
411.
▲
Show HN: Coelanox – auditable inference runtime in Rust (BERT runs today)
coelanox.com
discuss
2 months ago
Shark1n4Suit
3 points
412.
▲
LLM inference load balancer optimized for AMD Radeon VII GPUs
github.com/janit
discuss
3 months ago
velmu
3 points
413.
▲
RvLLM: High-performance LLM inference in Rust
github.com/m0at
discuss
3 months ago
mji
3 points
414.
▲
Rust-native hybrid training and inference engine for Apple Neural Engine and GPU
github.com/ncdrone
discuss
3 months ago
ngaut
3 points
415.
▲
Show HN: Doppler.js – WebGPU inference, faster/simpler than transformer.js
discuss
4 months ago
clocksmith
3 points
416.
▲
OMLX – LLM Inference Server for Apple Silicon (Ollama for MLX)
github.com/jundot
discuss
4 months ago
fintechie
3 points
417.
▲
Free LLM API Resources – A List of Free LLM Inference APIs
github.com/cheahjs
discuss
4 months ago
willmarquis
3 points
418.
▲
A tiny LM that does inference at compile time
github.com/erodola
discuss
5 months ago
signa11
3 points
419.
▲
Snow HN: ~950 line inference engine, on par with vLLM
github.com/naklecha
discuss
6 months ago
naklecha
3 points
420.
▲
Bare-Metal Llama 2 Inference in C++20 (No Frameworks, ARM Neon)
github.com/farukalpay
discuss
6 months ago
ornurla
3 points
More