eifachposte

eifachposte

I’ve been building a memory architecture for local LLMs called EigenFlame, and I think it’s different enough from standard RAG to be worth sharing. What problem does this solve Standard LLM memory is either nonexistent (stateless chat) or flat (dump everything into context and hope). Neither accumulates understanding over time — they accumulate data. EigenFlame is an attempt at something closer to how understanding actually compounds: not by remembering more, but by distilling what was remembered into something denser and more stable at each layer. The core idea Standard RAG retrieval is flat — every stored exchange competes equally by cosine similarity, and recency usually wins. EigenFlame does something different. After enough exchanges, it runs a synthesis cascade: Raw conversations → beliefs (cross-episode patterns) Beliefs → identity (who this entity has become) Identity versions → meta-pattern (how understanding is shifting) Meta-patterns → archetype (the invariant beneath all change) Each layer is weighted using figurate numbers from Pascal’s triangle — a synthesised identity statement outweighs a raw episode not because it’s older, but because it survived compression and represents distilled understanding. There’s also a seed — a phrase you set at session creation, embedded as a vector, immutable. Every query bends toward it before retrieval. Once an archetype crystallises, it becomes a second gravitational anchor. The system is pulled toward both its origin and what it has become. Honest caveats Vibe-coded. The architecture and ideas are entirely mine; Claude helped with implementation (thank you!). Only Ollama is supported right now (which I consider a feature — nothing leaves your machine). Synthesis quality depends heavily on model capability. 8B+ models produce significantly better results than 3–4B. I mostly used Qwem3.5:9b, Ministral-3:8b and Gemma3:12b. This is garage research. It works, it’s interesting, I don’t know yet if it scales to everything I imagine it could. Stack FastAPI + ChromaDB + Ollama. Vanilla React via CDN Babel — no build step. Runs entirely locally, no cloud dependencies. What’s next I’m building EigenResearch on the same architecture — instead of a conversational agent, you feed it notes and a question, and it synthesises an answer through the same cascade. Will release that separately once it’s properly tested. Links GitHub: https://github.com/latentweb/EigenFlame Personal Research Lab: https://latentweb.com/ Happy to answer questions about the architecture. Especially curious whether anyone has tried something similar with the synthesis cascade approach. I’m currently looking for work — ML engineering, AI systems, anything in this space. If this resonates with what you’re building, feel free to DM or reach me at research@latentweb.com . submitted by /u/crazy4donuts4ever

Originally posted by u/crazy4donuts4ever on r/ArtificialInteligence

A memory architecture for local LLMs that compresses conversations upward through dimensional layers — episodes → beliefs → identity → archetype. Here's how it works.

A memory architecture for local LLMs that compresses conversations upward through dimensional layers — episodes → beliefs → identity → archetype. Here's how it works.