TurboQuant KV Cache — Running 128B Models on Consumer Hardware
·1055 words·5 mins
KV cache is the memory wall that limits context length on consumer hardware. TurboQuant shrinks it 5x with minimal quality loss — here’s a ready-to-run build that packages llama.cpp with TurboQuant KV compression into a single conda install.