TL;DR: If you have auto-memory enabled ( /memory → on), you might be paying double tokens on every message — invisibly and silently. Here’s why. I’ve been seeing threads about random usage spikes, sessions eating 30-74% of weekly limits out of nowhere, first messages costing a fortune. Here’s at least one concrete technical explanation, from binary analysis of decompiled Claude Code (versions 2.1.74–2.1.83). The mechanism: extractMemories When auto-memory is on and a server-side A/B flag ( tengu_passport_quail ) is active on your account, Claude Code forks your entire conversation context into a separate, parallel API call after every user message. Its job is to analyze the conversation and save memories to disk. It fires while your normal response is still streaming . Why this matters for cost: Anthropic’s prompt cache requires the first request to finish before a cache entry is ready. Since both requests overlap, the fork always gets a cache miss — and pays full input token price. On a 200K token conversation, you’re paying ~400K input tokens per turn instead of ~200K. It also can’t be cancelled. Other background tasks in Claude Code (like auto_dream ) have an abortController . extractMemories doesn’t — it’s fire-and-forget. You interrupt the session, it keeps running. You restart, it keeps running. And it’s skipTranscript: true , so it never appears in your conversation log. It can also accumulate. There’s a “trailing run” mechanism that fires a second fork immediately after the first completes, and it bypasses the throttle that would normally rate-limit extractions. On a fast session with rapid messages, extractMemories can effectively run on every single turn — or even 2-3x per message if Claude Code retries internally. The fix Run /memory in Claude Code and turn auto-memory off . That’s it. This blocks extractMemories entirely, regardless of the server-side flag. If you’ve been hitting limits weirdly fast and you have auto-memory on — this is likely a significant contributor. Would be curious if anyone notices a difference after disabling it. submitted by /u/skibidi-toaleta-2137
Originally posted by u/skibidi-toaleta-2137 on r/ClaudeCode
