Search: github.com/tnfe | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

211.

Auto-unloading models using __init_subclass__ (Python)

github.com/Vrroom

3 years ago

1 points

212.

Bookish: math-infested markdown to HTML and latex

github.com/parrt

8 years ago

1 points

213.

Show HN: Mamba-Chat – A Chat LLM Based on State Space Models

github.com/havenhq

3 years ago

9 points

214.

Ask HN: Which cloud provider offers AMD MI250/MI300?

2 years ago

2 points

215.

Show HN: Distill – Remove redundant RAG context in 12ms, no LLM calls

6 months ago

2 points

216.

Threads can infect each other with their low priority

github.com/Dobiasd

7 years ago

68 points

217.

Llama2.c: Inference llama 2 in one file of pure C

github.com/karpathy

3 years ago

707 points

218.

The path to open-sourcing the DeepSeek inference engine

github.com/deepseek-ai

a year ago

550 points

219.

DeepSeek open source DeepEP – library for MoE training and Inference

github.com/deepseek-ai

a year ago

536 points

220.

DeepSeek 4 Flash local inference engine for Metal

github.com/antirez

2 months ago

499 points

221.

Flux 2 Klein pure C inference

github.com/antirez

5 months ago

453 points

222.

Gemma.cpp: lightweight, standalone C++ inference engine for Gemma models

github.com/google

2 years ago

422 points

223.

BitNet: Inference framework for 1-bit LLMs

github.com/microsoft

3 months ago

370 points

224.

Exllamav2: Inference library for running LLMs locally on consumer-class GPUs

github.com/turboderp

3 years ago

322 points

225.

Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model

github.com/antirez

4 months ago

311 points

226.

Lm.rs: Minimal CPU LLM inference in Rust with no dependency

github.com/samuel-vitorino

2 years ago

310 points

227.

Web LLM – WebGPU Powered Inference of Large Language Models

github.com/mlc-ai

3 years ago

276 points

228.

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon

github.com/RunanywhereAI

3 months ago

240 points

229.

A general-purpose probabilistic programming system with programmable inference

github.com/probcomp

7 years ago

238 points

230.

Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon

3 months ago

221 points

231.

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

github.com/jmaczan

24 days ago

205 points

232.

Gluon – A static, type-inferred and embeddable language written in Rust

github.com/gluon-lang

8 years ago

203 points

233.

Llama.rs – Rust port of llama.cpp for fast LLaMA inference on CPU

github.com/setzer22

3 years ago

202 points

234.

Show HN: We made our own inference engine for Apple Silicon

github.com/trymirai

a year ago

186 points

235.

Microsoft BitNet: inference framework for 1-bit LLMs

github.com/microsoft

2 years ago

173 points

236.

Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework

github.com/ai-dynamo

a year ago

150 points

237.

LLMLingua: Compressing Prompts for Faster Inferencing

github.com/microsoft

3 years ago

149 points

238.

Show HN: Zero-codegen, no-compile TypeScript type inference from Protobufs

github.com/nathanhleung

a year ago

138 points

239.

Gluon: A static, type inferred and embeddable language written in Rust

github.com/Marwes

10 years ago

136 points

240.

Launch HN: Cactus (YC S25) – AI inference on smartphones

github.com/cactus-compute

9 months ago

123 points