DeepSeek's multi-head latent attention and other KV cache tricks | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

DeepSeek's multi-head latent attention and other KV cache tricks | Heykuki News

DeepSeek's multi-head latent attention and other KV cache tricks

292 points

a year ago

72 comments

Threaded

Loading comments...