Continuous batching to increase LLM inference throughput and reduce p50 latency

Heykuki News

110 points

3 years ago

20 comments

Threaded

Loading comments...

Continuous batching to increase LLM inference throughput and reduce p50 latency | Heykuki News