Search: github.com/vrza | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

61.

Quansloth Using Google's Turboquant Breaks the "VRAM Wall" for Local LLMs

github.com/PacifAIst

2 months ago

2 points

62.

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

github.com/pheonix-delta

4 months ago

2 points

63.

Dead Simple Web UI for Training Flux LoRA with Low VRAM (12GB/16GB/20GB) Support

github.com/cocktailpeanut

2 years ago

2 points

64.

Show HN: Parakeet LLM Demo (378M param. 8GB VRAM)

2 years ago

2 points

65.

Adjust VRAM/RAM Split on Apple Silicon

github.com/ggerganov

3 years ago

1 points

66.

2.3x KV Cache Compression at 32k Context – Cut VRAM Costs by 50%

github.com/Jamie2111

a month ago

1 points

67.

Show HN: QKV Core – Run 7B LLMs on 4GB VRAM via surgical memory alignment

github.com/QKV-Core

6 months ago

1 points

68.

Super Merryo Trolls: An Adventure from the Days Before VRAM

github.com/GBirkel

2 years ago

1 points

69.

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

github.com/antoinezambelli

a month ago

687 points

70.

Show HN: InvokeAI, an open source Stable Diffusion toolkit and WebUI

github.com/invoke-ai

4 years ago

414 points

71.

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

github.com/alainnothere

3 months ago

265 points

72.

Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers

2 years ago

189 points

73.

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

17 days ago

63 points

74.

Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts

github.com/Zyora-Dev

4 months ago

58 points

75.

Show HN: I built a RISC-V emulator that runs DOOM

github.com/lalitshankarch

2 months ago

50 points

76.

Show HN: Local task classifier and dispatcher on RTX 3080

github.com/resilientworkflowsentinel

5 months ago

26 points

77.

Show HN: KTransformers–236B Model and 1M Context LLM Inference on Local Machines

github.com/kvcache-ai

2 years ago

20 points

78.

Show HN: Demon – open-source real-time music diffusion engine, 25Hz local GPU

daydreamlive.github.io

a month ago

ryanontheinside

17 points

79.

Show HN: Finetune Llama-3.1 2x faster in a Colab

colab.research.google.com

2 years ago

16 points

80.

Show HN: Salad, a distributed cloud for AI (like Airbnb for GPUs)

2 years ago

15 points

81.

Show HN: KTransformers:671B DeepSeek-R1 on a Single Machine-286 tokens/s Prefill

github.com/kvcache-ai

a year ago

14 points

82.

Show HN: Willow Inference Server: Optimized ASR/TTS/LLM for Willow/WebRTC/REST

github.com/toverainc

3 years ago

13 points

83.

Show HN: Lightweight Llama3 Inference Engine – CUDA C

github.com/abhisheknair10

a year ago

12 points

84.

Show HN: Automatic 1111, but as a Python Package

github.com/saketh12

2 years ago

11 points

85.

Show HN: Coderive – Iterating through 1 Quintillion Inside a Loop in just 50ms

github.com/DanexCodr

6 months ago

8 points

86.

Show HN: onprem unstructured data extraction with 4 lines of code

github.com/NanoNets

a year ago

8 points

87.

Show HN: Local GLaDOS

2 years ago

8 points

88.

Show HN: WaveletLM – wavelet-based, attention-free model with O(n log n) scaling

github.com/ramongougis

2 months ago

7 points

89.

Show HN: Serve 100 Large AI models on a single GPU with low impact to TTFT

github.com/leoheuler

7 months ago

7 points

90.

Show HN: Federation of robots collaboratively train an object manipulation model

github.com/adap

a year ago

7 points