Original Reddit post

Follow-up to my post from a couple weeks ago about giving Claude Code persistent memory with a self-hosted mem0 MCP server. The project picked up some stars and forks since then, and a few of those forks added improvements for specific use cases. That inspired most of what’s in this update. CLAUDE.md instructions like “search memories” are probabilistic. Claude can just skip them. And after context compaction, injected memories get summarized away. For a memory system, that’s not good enough. v0.2.1 fixes that with session hooks. What changed:

  1. SessionStart hook (the main thing) Instead of relying on CLAUDE.md, there’s now a shell hook that fires deterministically at session start. Zero conversation turns consumed. No prompt needed. It runs 2 semantic searches against Qdrant (15 results each), deduplicates by memory ID, caps at 20 memories, and injects them via additionalContext . The matcher is “startup|compact” , so it also fires after context compaction. When Claude compresses your conversation, memories get re-injected automatically. “hooks”: { “SessionStart”: [ { “matcher”: "startupcompact", “hooks”: [{ “type”: “command”, “command”: “mem0-hook-context”, “timeout”: 15000 ] } ] } }
  2. Stop hook (safety net) When a session ends, the Stop hook reads the last ~3 exchanges from the transcript via a bounded deque and saves them to mem0 with infer=True . Even if Claude never called add_memory during the session, the important context gets captured. Trivial sessions are skipped (both user input under 20 chars and assistant response under 50 chars). One-command hook install mem0-install-hooks # current project mem0-install-hooks --global # all projects Reads your existing .claude/settings.json , merges the hook config, doesn’t touch anything else. Idempotent.
  3. Ollama as main LLM The first version used Anthropic’s API for fact extraction. Now Ollama can handle everything locally, main LLM (qwen3:14b) + embeddings (bge-m3). Zero cloud dependencies. claude mcp add --scope user --transport stdio mem0 \ –env MEM0_PROVIDER=ollama \ –env MEM0_LLM_MODEL=qwen3:14b \ –env MEM0_USER_ID=your-user-id \ – uvx --from git+https://github.com/elvismdev/mem0-mcp-selfhosted.git mem0-mcp-selfhosted
  4. OAT token auto-refresh OAT tokens expire. The server now checks expiry before every API call (30-min window) and refreshes proactively. If a 401 still happens, there’s a 3-step fallback: re-read credentials file, self-refresh via OAuth endpoint, wait-and-retry.
  5. DRY configuration Before: 6+ env vars for a fully local setup. Now: –env MEM0_PROVIDER=ollama \ –env MEM0_OLLAMA_URL=http://localhost:11434/ MEM0_PROVIDER cascades to all sub-providers. MEM0_OLLAMA_URL cascades to all URL settings. If you installed via uvx: Just clear the cache and restart Claude Code: uv cache clean mem0-mcp-selfhosted mem0-install-hooks uvx will pull the latest version on next launch. What I’d like feedback on: If you’re running the first version, does the hook install work cleanly on your existing settings? Does the SessionStart hook make memory injection consistent for you? Anyone running Ollama as the main LLM? How’s the JSON reliability holding up? The Stop hook captures last ~3 exchanges. Too few? Too many? GitHub: https://github.com/elvismdev/mem0-mcp-selfhosted Issues and PRs welcome. submitted by /u/Aware-One7480

Originally posted by u/Aware-One7480 on r/ClaudeCode