Quick share, and full disclosure up front: this is my own project, so feel free to be skeptical. Here’s the thing that always bugged me. Every time you ask an AI assistant about a long document, it reads the whole document again from scratch. Ask it ten questions about a 100 page report and it has basically read a thousand pages. That repeated reading is a big part of why long AI chats get slow and why the bills pile up. The approach is pretty simple when you say it out loud. Instead of recomputing every time, you store what the model already read and put it back when it’s needed. The part I think is genuinely neat is that the restored version isn’t just “close enough”, it comes back identical down to the bit, and you can confirm that yourself with a checksum (the same idea you use to check that a download didn’t get corrupted). A couple of things that make it a bit more than normal caching: You can check every claim yourself. The proofs are public hashes, run on open models from Meta, Alibaba and Mistral, so nobody is asking you to just trust them. The stored memory can move between different machines, and even between different GPU generations, and still give the same output. To make the whole chain inspectable they also open sourced a small AI model that was trained for about 600 euro. It’s tiny and honestly not trying to beat the big models. It’s just there so people can poke at every step. I’ll be upfront that it’s a narrow claim, not magic. It doesn’t make a small model smart. It’s specifically about reusing an AI’s memory without losing anything. But the “you don’t need a bigger brain, you need a better memory” angle stuck with me. Writeup with all the links and the proofs is here: https://tech.einnews.com/pr_news/917089794/corbenic-ai-releases-technology-that-eliminates-ai-s-largest-cost Genuinely curious what people here think, especially folks who work on inference or KV caching. Is lossless reuse like this actually useful in practice, or do the current setups (vLLM, prefix caching, that kind of thing) already cover most of it? submitted by /u/MindPsychological140
Originally posted by u/MindPsychological140 on r/ArtificialInteligence
