Show HN: NanoSLG – Hack Your Own Multi-GPU LLM Server (5x Faster, Educational)github.com/Guney-olu1 pointgeniusyan4 months agoI built NanoSLG as a minimal, educational inference server for LLMs like Llama-3.1-8B. It supports Pipeline Parallelism (split layers across GPUs), Tensor Parallelism (shard weights), and Hybrid modes for scaling.