Original Reddit post

I’ve been developing long, detailed conversations with AI (story development, system design, deep project planning), and I kept hitting the same wall: As conversations grow longer, models start hallucinating. Characters appear that were never introduced Decisions we locked in get “reconsidered” randomly Constraints get ignored Previously rejected options reappear And this isn’t just storytelling. It happens in: Project development Workout planning Technical architecture Personal advice Even casual long chats This isn’t a creativity problem. It’s a continuity problem. The Core Issue LLMs don’t actually “remember.” They only see a fixed-size context window. When earlier tokens fall out, the model fills the gaps with statistically plausible guesses. That’s hallucination. More context tokens won’t truly fix this. Because even with more tokens: Everything has equal weight No prioritization exists No authority hierarchy exists What’s missing isn’t memory size. It’s memory structure. The Human Analogy Humans don’t remember every word of a conversation. We compress experiences into: Important facts Decisions Constraints Intent Emotional signals Our subconscious stores meaning, not transcripts. AI systems mostly store transcripts. That’s the flaw. The Proposal: A Sub-Context Layer Instead of relying purely on raw chat history, introduce a conversation-scoped Sub-Context layer that stores only: Intent (Why this conversation exists) Constraints (Hard boundaries that must not be violated) Decisions (Resolved forks that shouldn’t reopen randomly) Facts (Stable truths established in-session) Preferences (Interaction style & tone signals) Open Loops (Unresolved threads) This is not long-term memory. This is not user profiling. This is a temporary, authoritative semantic layer for a single conversation window. Pipeline Change Instead of: User Prompt Chat History → Model → Response It becomes: User Prompt → Sub-Context Recall Recent Chat → Model → Response → Sub-Context Update Key rule: Sub-Context has higher authority than raw chat history. If there’s conflict, Sub-Context wins. Why This Would Reduce Hallucination Everywhere Without Sub-Context: Model loses earlier constraints → fills gaps → hallucination With Sub-Context: Model loses old tokens → still sees structured commitments → bounded reasoning Creativity becomes constrained imagination instead of random guessing. This Isn’t Just a Story Problem In code conversations: Stops nonexistent APIs from reappearing In fitness conversations: Prevents unsafe advice contradicting earlier injuries In business planning: Stops re-suggesting rejected strategies In casual chats: Prevents personality drift Bigger Windows Aren’t the Real Fix Even with infinite tokens: The model doesn’t know what matters. A Sub-Context layer introduces: Priority Stability Constraint enforcement Semantic compression Basically: a cognitive spine for the conversation. I originally explored this idea in detail while formalizing a generic sub-context schema and update rules. sub context memory layer.docx None Curious what people here think: Is this already being explored deeply in architecture-level AI systems? Is RAG enough for this, or does this require a new layer? Would this meaningfully reduce hallucination, or just shift the problem? I’m genuinely interested in pushing this further at a systems-design level. Because right now, long conversations with LLMs feel smart — but fragile. And fragility feels architectural. submitted by /u/revived_soul_37

Originally posted by u/revived_soul_37 on r/ArtificialInteligence