KV cache for llama and comfui

KnackAU · April 15, 2026, 10:35pm

Here you go: I am bored! Working KV-Cache system with 0 ppl loss I can’t remember how far it goes. But it can go much further! explore, use, share, enjoy. for CUDA,Vulkan,NEON,Adreno,Pytorch, I can’t remember what else. intergration with comfui and llama. The only reason for the license is because it is the Shannon limit and describes the transformer itself. I won’t be pursuing any money or stopping it’s use. I am seriously bored. I don’t normally complete projects. I guess i didnt complete this one either… But it works, there is a model weights compressor included.. but I haven’t tested it yet.. But I won’t be doing any further work on this. So.. it should be pretty easy to see the quick wins and if you just follow the math it will lead to a 10D Torus. So expand it out to 10 bands and run a “spinor” transform and you hit 8x-10x umm.. whatelse… you can reduce the skeleton to 2bit.. ummm it’s univeral… you could reduce to 1bit… you caould create a predictor of where not to hit. I mean… there is lots you can do… so DO IT! ohh yeah.. you can reduce it and run entirely in cache.. AMD has 64mb… that can fit a decent model..

nihilistau/shannon-prime

Topic		Replies	Views
KV Cache Compression Research	2	128	February 10, 2026
KV caching for varying length texts 🤗Transformers	1	220	December 16, 2024
Model.generate use_cache=True generates different results than use_cache=False Intermediate	3	527	March 4, 2025
Generate: using k-v cache is faster but no difference to memory usage 🤗Transformers	5	16656	June 3, 2025
KV cache sizing 🤗Transformers	0	789	August 24, 2023

KV cache for llama and comfui

Related topics