vLLM-mlx – 65 tok/s LLM inference on Mac with tool calling and prompt caching

Heykuki News

3 points

4 months ago

1 comment

Threaded

Loading comments...

vLLM-mlx – 65 tok/s LLM inference on Mac with tool calling and prompt caching | Heykuki News