Tired of your AI coding assistant forgetting everything the moment you hit the context limit? I built AHME to solve exactly that. What it does: AHME sits as a local sidecar daemon next to your AI coding assistant. While you work, it quietly compresses your conversation history into a dense “Master Memory Block” using a local Ollama model — fully offline, zero cloud, zero cost. How it works:
- Your conversations get chunked and queued in a local SQLite database
- When the CPU is idle, a small local model (qwen2:1.5b, gemma3:1b, phi3, etc.) compresses them into structured JSON summaries
- Those summaries are recursively merged via a tree-reduce algorithm into one dense Master Memory Block
- The result is written to
.ahme_memory.md(for any file-reading tool) and exposed via MCP tools The killer pattern: When you’re approaching your context limit, callget_master_memory. It returns the compressed summary, resets the engine, and re-seeds it with that summary. Every new session starts from a dense checkpoint, not a blank slate. Compatible with: Claude Code, Cursor, Windsurf, Kilo Code, Cline/Roo, Antigravity — basically anything that supports MCP or can read a markdown file. Tech stack: Python 3.11+ · Ollama · SQLite · MCP (stdio + SSE) · tiktoken for real BPE chunking · psutil for CPU-idle gating Why local-first? - Your code never leaves your machine
- No API costs
- Works offline
- Survives crashes (SQLite persistence) It’s on GitHub: search DexopT/AHME-MCP 19 tests, all passing. MIT license. Feedback and contributions very welcome! Happy to answer any questions about the architecture or design decisions. submitted by /u/DexopT
Originally posted by u/DexopT on r/ClaudeCode
You must log in or # to comment.
