Original Reddit post

Tired of your AI coding assistant forgetting everything the moment you hit the context limit? I built AHME to solve exactly that. What it does: AHME sits as a local sidecar daemon next to your AI coding assistant. While you work, it quietly compresses your conversation history into a dense “Master Memory Block” using a local Ollama model — fully offline, zero cloud, zero cost. How it works:

  • Your conversations get chunked and queued in a local SQLite database
  • When the CPU is idle, a small local model (qwen2:1.5b, gemma3:1b, phi3, etc.) compresses them into structured JSON summaries
  • Those summaries are recursively merged via a tree-reduce algorithm into one dense Master Memory Block
  • The result is written to .ahme_memory.md (for any file-reading tool) and exposed via MCP tools The killer pattern: When you’re approaching your context limit, call get_master_memory. It returns the compressed summary, resets the engine, and re-seeds it with that summary. Every new session starts from a dense checkpoint, not a blank slate. Compatible with: Claude Code, Cursor, Windsurf, Kilo Code, Cline/Roo, Antigravity — basically anything that supports MCP or can read a markdown file. Tech stack: Python 3.11+ · Ollama · SQLite · MCP (stdio + SSE) · tiktoken for real BPE chunking · psutil for CPU-idle gating Why local-first?
  • Your code never leaves your machine
  • No API costs
  • Works offline
  • Survives crashes (SQLite persistence) It’s on GitHub: search DexopT/AHME-MCP 19 tests, all passing. MIT license. Feedback and contributions very welcome! Happy to answer any questions about the architecture or design decisions. submitted by /u/DexopT

Originally posted by u/DexopT on r/ClaudeCode