Vectors are waveforms. Dot products are wave interference. I kept looking at attention through this lens. In the attention mechanism, Q, K, and V all transform the same input. Optimize the same loss. Why three separate matrices? The original paper offered no justification. It worked, so everyone adopted it. One unified matrix. A single projection, split into three bands. 67% fewer attention parameters. Tested it at 484K parameters. The model tells coherent stories. Runs 700+ tokens/sec on CPU. Demo: https://huggingface.co/spaces/Reinforce-ai/yocto-demo Code: https://github.com/ReinforceAI/yocto Small models run on laptops but lack quality. 7B has quality but needs servers. Building something that does both. Open source. Would love feedback. submitted by /u/Financial_Buy_2287
Originally posted by u/Financial_Buy_2287 on r/ArtificialInteligence
