HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
61.
▲
Quansloth Using Google's Turboquant Breaks the "VRAM Wall" for Local LLMs
github.com/PacifAIst
1 comment
2 months ago
gunzfanatic
2 points
62.
▲
Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"
github.com/pheonix-delta
1 comment
4 months ago
shubham-coder
2 points
63.
▲
Dead Simple Web UI for Training Flux LoRA with Low VRAM (12GB/16GB/20GB) Support
github.com/cocktailpeanut
discuss
2 years ago
cocktailpeanut
2 points
64.
▲
Show HN: Parakeet LLM Demo (378M param. 8GB VRAM)
discuss
2 years ago
razodactyl
2 points
65.
▲
Adjust VRAM/RAM Split on Apple Silicon
github.com/ggerganov
1 comment
3 years ago
tosh
1 points
66.
▲
2.3x KV Cache Compression at 32k Context – Cut VRAM Costs by 50%
github.com/Jamie2111
discuss
a month ago
JamieObala
1 points
67.
▲
Show HN: QKV Core – Run 7B LLMs on 4GB VRAM via surgical memory alignment
github.com/QKV-Core
discuss
6 months ago
broxytr
1 points
68.
▲
Super Merryo Trolls: An Adventure from the Days Before VRAM
github.com/GBirkel
discuss
2 years ago
vatys
1 points
69.
▲
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
github.com/antoinezambelli
252 comments
a month ago
zambelli
687 points
70.
▲
Show HN: InvokeAI, an open source Stable Diffusion toolkit and WebUI
github.com/invoke-ai
102 comments
4 years ago
sophrocyne
414 points
71.
▲
Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training
github.com/alainnothere
80 comments
3 months ago
xlayn
265 points
72.
▲
Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers
79 comments
2 years ago
areddyyt
189 points
73.
▲
Launch HN: General Instinct (YC P26) – Frontier models on edge devices
16 comments
17 days ago
guanming0717
63 points
74.
▲
Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts
github.com/Zyora-Dev
9 comments
4 months ago
zyoralabs
58 points
75.
▲
Show HN: I built a RISC-V emulator that runs DOOM
github.com/lalitshankarch
4 comments
2 months ago
Flex247A
50 points
76.
▲
Show HN: Local task classifier and dispatcher on RTX 3080
github.com/resilientworkflowsentinel
2 comments
5 months ago
Shubham_Amb
26 points
77.
▲
Show HN: KTransformers–236B Model and 1M Context LLM Inference on Local Machines
github.com/kvcache-ai
3 comments
2 years ago
sssummer
20 points
78.
▲
Show HN: Demon – open-source real-time music diffusion engine, 25Hz local GPU
daydreamlive.github.io
13 comments
a month ago
ryanontheinside
17 points
79.
▲
Show HN: Finetune Llama-3.1 2x faster in a Colab
colab.research.google.com
2 comments
2 years ago
danielhanchen
16 points
80.
▲
Show HN: Salad, a distributed cloud for AI (like Airbnb for GPUs)
4 comments
2 years ago
bobjmiles
15 points
81.
▲
Show HN: KTransformers:671B DeepSeek-R1 on a Single Machine-286 tokens/s Prefill
github.com/kvcache-ai
discuss
a year ago
sssummer
14 points
82.
▲
Show HN: Willow Inference Server: Optimized ASR/TTS/LLM for Willow/WebRTC/REST
github.com/toverainc
13 comments
3 years ago
kkielhofner
13 points
83.
▲
Show HN: Lightweight Llama3 Inference Engine – CUDA C
github.com/abhisheknair10
discuss
a year ago
abhisheknair10
12 points
84.
▲
Show HN: Automatic 1111, but as a Python Package
github.com/saketh12
discuss
2 years ago
saketh105
11 points
85.
▲
Show HN: Coderive – Iterating through 1 Quintillion Inside a Loop in just 50ms
github.com/DanexCodr
13 comments
6 months ago
DanexCodr
8 points
86.
▲
Show HN: onprem unstructured data extraction with 4 lines of code
github.com/NanoNets
discuss
a year ago
souvik3333
8 points
87.
▲
Show HN: Local GLaDOS
old.reddit.com
discuss
2 years ago
dnhkng
8 points
88.
▲
Show HN: WaveletLM – wavelet-based, attention-free model with O(n log n) scaling
github.com/ramongougis
1 comment
2 months ago
anarmorarm
7 points
89.
▲
Show HN: Serve 100 Large AI models on a single GPU with low impact to TTFT
github.com/leoheuler
1 comment
7 months ago
leonheuler
7 points
90.
▲
Show HN: Federation of robots collaboratively train an object manipulation model
github.com/adap
discuss
a year ago
jafermarq
7 points
More