HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
331.
▲
Pipeline-parallel LLM inference across GPUs on separate machines
github.com/leyten
discuss
4 days ago
ngaut
5 points
332.
▲
Show HN: FlashQwen – A from-scratch CUDA inference engine for Qwen3
github.com
discuss
8 days ago
langtang1996
5 points
333.
▲
AI Agent that at inference time updates it's harness and model weights
github.com/hexo-ai
discuss
23 days ago
martianvoid
5 points
334.
▲
Show HN: Smile-Serve – Inference Server for ML, ONNX, and LLM
github.com/haifengl
discuss
2 months ago
haifeng
5 points
335.
▲
vLLM introduces memory optimizations for long-context inference
github.com/vllm-project
discuss
3 months ago
addisud
5 points
336.
▲
Zinc – LLM inference engine written in Zig, running 35B models on $550 AMD GPUs
github.com/zolotukhin
discuss
3 months ago
mvdwoord
5 points
337.
▲
Show HN: Llmtop – Htop for LLM Inference Clusters (vLLM, SGLang, Ollama, llama)
github.com/InfraWhisperer
discuss
3 months ago
rpotluri
5 points
338.
▲
MetalChat – Llama Inference for Apple Silicone
github.com/ybubnov
discuss
4 months ago
ybubnov
5 points
339.
▲
Voxtral.c Voxtral Realtime 4B model inference as a C library
github.com/antirez
discuss
5 months ago
antirez
5 points
340.
▲
llama2.zig: Inference Llama 2 in one file of pure Zig
github.com/cgbur
discuss
7 months ago
tosh
5 points
341.
▲
T-Mac: Low-bit LLM inference on CPU/NPU with lookup table
github.com/microsoft
discuss
9 months ago
nateb2022
5 points
342.
▲
Show HN: gline-rs – an inference engine for GLiNER models, in Rust
github.com/fbilhaut
discuss
a year ago
fbilhaut
5 points
343.
▲
Fast LLM Inference in Rust
github.com/EricLBuehler
discuss
2 years ago
goranmoomin
5 points
344.
▲
Fast and hackable PyTorch native transformer inference
github.com/pytorch-labs
discuss
3 years ago
gavi
5 points
345.
▲
Lepton: An open-source library (Apache 2.0) for scaling model inference
github.com/leptonai
discuss
3 years ago
Jimmc414
5 points
346.
▲
Run LLaMA Inference on CPU, with Rust
github.com/rustformers
discuss
3 years ago
kristianpaul
5 points
347.
▲
Three-processor inference on AMD Ryzen AI 300
github.com/Peterc3-dev
2 comments
3 months ago
peterc3dev
4 points
348.
▲
LangPatrol: A static analyzer for LLM prompts that catches bugs before inference
github.com/langpatrol
2 comments
6 months ago
mmarvin
4 points
349.
▲
Show HN: Inference Mixtral 8x7B in pure Rust
github.com/moritztng
2 comments
2 years ago
molli
4 points
350.
▲
Show HN: Ggml.js – Serverless AI Inference on Browser with Web Assembly
rahuldshetty.github.io
2 comments
3 years ago
anonymousd3vil
4 points
351.
▲
TensorSharp: Open-Source Local LLM Inference Engine
github.com/zhongkaifu
1 comment
20 days ago
zhongkaifu
4 points
352.
▲
Train and inference GPT in 243 lines of pure, dependency-free Python by Karpathy
gist.github.com
1 comment
4 months ago
itvision
4 points
353.
▲
PasLLM: An Object Pascal inference engine for LLM models
github.com/BeRo1985
1 comment
7 months ago
nor-and-or-not
4 points
354.
▲
Distributed-Llama: Connect home devices into a cluster for LLM inference
github.com/b4rtaz
1 comment
a year ago
tosh
4 points
355.
▲
Practical Llama 3 inference in Java
github.com/mukel
1 comment
2 years ago
mukel
4 points
356.
▲
Llama.cpp speculative sampling: 2x faster inference for large models
github.com/ggerganov
1 comment
3 years ago
bobivl
4 points
357.
▲
Zig GPT-2 inference engine
github.com/EugenHotaj
1 comment
3 years ago
eugenhotaj
4 points
358.
▲
Stable Diffusion inference locally on iOS / macOS using MPSGraph
github.com/mortenjust
1 comment
4 years ago
consumer451
4 points
359.
▲
Pytype checks and infers types for your Python code
github.com/google
1 comment
7 years ago
mkesper
4 points
360.
▲
Inferential database seeding in Clojure
michaeldrogalis.github.com
discuss
14 years ago
MichaelDrogalis
4 points
More