<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Kv-Cache on Kevin Keller</title><link>https://kevinkeller.org/tags/kv-cache/</link><description>Recent content in Kv-Cache on Kevin Keller</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>kellerkev@gmail.com (Kevin Keller)</managingEditor><webMaster>kellerkev@gmail.com (Kevin Keller)</webMaster><copyright>© 2026 Kevin Keller</copyright><lastBuildDate>Thu, 30 Apr 2026 10:00:00 +0000</lastBuildDate><atom:link href="https://kevinkeller.org/tags/kv-cache/index.xml" rel="self" type="application/rss+xml"/><item><title>TurboQuant KV Cache — Running 128B Models on Consumer Hardware</title><link>https://kevinkeller.org/posts/turboquant-kv-cache-local-llm-consumer-hardware/</link><pubDate>Thu, 30 Apr 2026 10:00:00 +0000</pubDate><author>kellerkev@gmail.com (Kevin Keller)</author><guid>https://kevinkeller.org/posts/turboquant-kv-cache-local-llm-consumer-hardware/</guid><description>KV cache is the memory wall that limits context length on consumer hardware. TurboQuant shrinks it 5x with minimal quality loss — here&amp;rsquo;s a ready-to-run build that packages llama.cpp with TurboQuant KV compression into a single conda install.</description></item></channel></rss>