Original Reddit post

Spoiler: none of this is groundbreaking, it was all hiding in plain sight. What eats tokens the most: Image analysis and Playwright. Screenshots = thousands of tokens each. Playwright is great and worth it, just be aware. Early project phase. When Claude explores a codebase for the first time — massive IN/OUT spike. Once cache kicks in, it stabilizes. Cache hit ratio reaches ~99% within minutes. Agent spawning. Every subagent gets partial context + generates its own tokens. Think twice before spawning 5 agents for something 2 could handle. Unnecessary plugins. Each one injects its schema into the system prompt. More plugins = bigger context = more tokens on every single message. Keep it lean. Numbers I’m seeing (Opus 4.6):

  • 5h window total capacity: estimated ~1.8-2.2M tokens (IN+OUT combined, excluding cache)
  • 7d window capacity: early data suggests ~11-13M (only one full window so far, need more weeks)
  • Active burn rate: ~600k tokens/hour when working
  • Claude generates 2.3x more tokens than it reads
  • ~98% of all token flow is cache read. Only ~2% is actual LLM output + cache writes That last point is wild — some of my longer sessions are approaching 1 billion tokens total if you count cache. But the real consumption is a tiny fraction of that. What I actually changed after seeing this data: I stopped spawning agent teams for tasks a single agent could handle. I removed 3 MCP plugins I never used. I started with /compact on resumed sessions. Small things, but they add up. A note on the data: I started collecting when my account was already at ~27% on the 7d window, so I’m missing the beginning of that cycle. A clearer picture should emerge in about 14 days when I have 2-3 full 7d windows. Also had to add multi-account profiles on the fly — I have two accounts and need to switch between them to keep metrics consistent per account. By the way — one Max 20x account burns through the 7d window in roughly 3 days of active work. So you’re really paying for 3 heavy days, not 7. To be fair, I’m not trying to save tokens at all — I optimize for quality. Some of my projects go through 150-200 review iterations by agents, which eats 500-650k tokens out of Opus 4.6’s 1M context window in a single session. What I actually changed after seeing this data: I stopped spawning agent teams for tasks a single agent could handle. I removed 3 MCP plugins I never used. I started with /compact on resumed sessions. Small things, but they add up. Still collecting. Will post updated numbers in a few weeks. submitted by /u/VariousComment6946

Originally posted by u/VariousComment6946 on r/ClaudeCode