KV cache for llama and comfui

Here you go: I am bored! Working KV-Cache system with 0 ppl loss I can’t remember how far it goes. But it can go much further! explore, use, share, enjoy. for CUDA,Vulkan,NEON,Adreno,Pytorch, I can’t remember what else. intergration with comfui and llama. The only reason for the license is because it is the Shannon limit and describes the transformer itself. I won’t be pursuing any money or stopping it’s use. I am seriously bored. I don’t normally complete projects. I guess i didnt complete this one either… But it works, there is a model weights compressor included.. but I haven’t tested it yet.. But I won’t be doing any further work on this. So.. it should be pretty easy to see the quick wins and if you just follow the math it will lead to a 10D Torus. So expand it out to 10 bands and run a “spinor” transform and you hit 8x-10x umm.. whatelse… you can reduce the skeleton to 2bit.. ummm it’s univeral… you could reduce to 1bit… you caould create a predictor of where not to hit. I mean… there is lots you can do… so DO IT! ohh yeah.. you can reduce it and run entirely in cache.. AMD has 64mb… that can fit a decent model..

nihilistau/shannon-prime

1 Like