Kv-Cache on Kevin Keller

Kv-Cache on Kevin Kellerhttps://kevinkeller.org/tags/kv-cache/Recent content in Kv-Cache on Kevin KellerHugo -- gohugo.ioenkellerkev@gmail.com (Kevin Keller)kellerkev@gmail.com (Kevin Keller)© 2026 Kevin KellerThu, 30 Apr 2026 10:00:00 +0000TurboQuant KV Cache — Running 128B Models on Consumer Hardwarehttps://kevinkeller.org/posts/turboquant-kv-cache-local-llm-consumer-hardware/Thu, 30 Apr 2026 10:00:00 +0000kellerkev@gmail.com (Kevin Keller)https://kevinkeller.org/posts/turboquant-kv-cache-local-llm-consumer-hardware/KV cache is the memory wall that limits context length on consumer hardware. TurboQuant shrinks it 5x with minimal quality loss — here’s a ready-to-run build that packages llama.cpp with TurboQuant KV compression into a single conda install.