Original Reddit post

Been deep in the LLM memory space for months, and I keep seeing the same pattern: everyone is building better ways to store and retrieve memories, but almost nobody is solving the actual bottleneck — getting the right memory into context at the right time. Here’s the core issue: agents don’t know to ask for what they don’t know they have. Tool-based memory (“call recall() when you need context”) is fundamentally broken because the agent has to already know something is relevant before requesting it. It’s like telling someone with amnesia “just ask me if you forgot something.” If they forgot, they don’t know to ask. I’ve been experimenting with three approaches to solve this:

  1. Proactive injection at session start. Instead of waiting for the agent to search, automatically inject a compressed user profile + active workflows + pending reminders into context before the first message. The agent starts every turn already knowing what matters. MCP resources make this possible — memory as a readable resource, not just a callable tool.
  2. Typed memory with different retrieval algorithms. Tulving’s taxonomy from the 1970s still holds: semantic (facts), episodic (events), and procedural (workflows) need fundamentally different search strategies. “What does the user prefer?” is keyword/embedding search. “What happened last week?” is time-range filtering with decay. “How do we deploy?” is step-sequence matching with success rate weighting. Treating all three as “embed and cosine-search” is like using a hammer for screws.
  3. Background extraction, not on-demand. Most systems extract memory when the user explicitly saves something. But the richest signal comes from conversations the user never thought to save . Running extraction asynchronously after every interaction catches things like “oh they mentioned switching from Python to Rust” that no one would manually tag as a memory. The Titans architecture from Google (test-time weight updates) is interesting but orthogonal — it improves what happens inside a single model session. It doesn’t solve cross-session, cross-model, or cross-agent memory. Your Gemini Titans instance learns something, but Claude doesn’t know it. Agent A learns something, Agent B can’t access it. A few open questions I’m still working through: How do you handle memory contradictions at scale? “User prefers Python” from 6 months ago vs “User switched to Rust” from last week. Temporal decay helps but doesn’t fully solve it. Is there a ceiling on how much proactive context you can inject before it becomes noise? I’ve found ~2-3K tokens of profile + procedures works well, but beyond that the agent starts losing focus. Has anyone successfully implemented procedural memory with reinforcement — where the system tracks which workflows actually succeeded vs failed and adjusts confidence accordingly? Curious what approaches others have tried. The memory-for-agents space is moving fast but feels like it’s still mostly “better RAG” rather than rethinking the architecture. submitted by /u/No_Advertising2536

Originally posted by u/No_Advertising2536 on r/ArtificialInteligence