How do LLMs store so much knowledge? A look at feature superposition

thethoughtprocess.xyz

How do LLMs store so much knowledge? A look at feature superposition

thethoughtprocess.xyz

eifachposteMB to AI (Reddit RSS)English · 3 hours ago

How Does LLMs Store Knowledge? A Deep Dive Into Feature Superposition

thethoughtprocess.xyz

Ask ChatGPT about quantum physics, medieval history, or cooking, and it delivers precise answers, even offline. How does it know so much? The secret is feature superposition, a mechanism allowing AI to compress vast knowledge into a finite space. This deep dive explores how AI stores knowledge using this fascinating property. The Foundation: Disentangling Features […]

Original Reddit post

I’ve been reading more into how LLMs store information internally, and what I’ve learned is just amazing! The core idea is : instead of one neuron encoding one concept, concepts are encoded in quazi orthogonal directions. This quazi-orthogonality is the key to explain model capacity. Meaning not only are concepts spread across neurons but completly separate concept are allow to have non-zero correlation which actually increase the model capacity exponentially! This is called feature superposition. This is very similar to how our genes encode information. And this not done on purpose by LLM makers, this is an emergent property of the training process and architecture. This is quite fascinating and very unexpected. If you want to read more about this, I’ve written a complete deep dive on the subject : https://thethoughtprocess.xyz/en/how-does-llms-store-knowledge-a-deep-dive-into-feature-superposition submitted by /u/Kindly-Hawk

Originally posted by u/Kindly-Hawk on r/ArtificialInteligence

You must log in or # to comment.

Chat