Original Reddit post

Disclosure: I built this and it’s open source. Every AI tool has the same problem. Close the chat and it forgets everything. The built-in memory features that exist are black boxes. You can’t search them, audit them, or control what gets stored. What I built: An MCP server that runs on Cloudflare Workers and gives any MCP-compatible AI client persistent, searchable memory. Five tools: remember, recall, list_recent, forget, append. How it works: Every note gets embedded using bge-small-en-v1.5 on Workers AI and stored in Cloudflare Vectorize as a 384-dimensional vector. Recall queries by cosine similarity so retrieval works by meaning not keywords. “Users dropping off at checkout” surfaces when you search “conversion problems” with no keyword overlap needed. Long notes are chunked at sentence boundaries with 200-character overlap before embedding. Each section gets its own vector rather than one diluted embedding for the whole note. Duplicate detection runs before every store. Above 95% similarity the write is blocked. Between 85-95% it’s stored but flagged. This stops the brain filling up with repeated context across sessions. The append tool handles updates. When something changes, it adds to an existing entry with a timestamp rather than creating a conflicting duplicate. Write pattern: D1 write is synchronous so the response is instant. Vectorize embedding runs via ctx.waitUntil() in the background so capture stays fast. Limitations: No dashboard yet. Browsing memory is raw JSON from an endpoint. Vectorize and Workers AI don’t run in local wrangler dev, you need –remote for real testing. ChatGPT MCP support is in beta via Developer Mode for Plus/Pro users only. Stack: Cloudflare Workers, D1, Vectorize, Workers AI. Free tier. submitted by /u/rahilpirani5

Originally posted by u/rahilpirani5 on r/ArtificialInteligence