vLLM-mlx – 65 tok/s LLM inference on Mac with tool calling and prompt cachinggithub.com/raullenchai3 pointsraullen4 months ago