TL;DR I parsed Claude Code’s local JSONL conversation files and cross-referenced them against the per-charge billing data from my Anthropic dashboard. Over Feb 3-12, I can see 206 individual charges totaling $2,413.25 against 388 million tokens recorded in the JSONL files. That works out to $6.21 per million tokens — almost exactly the cache creation rate ($6.25/M), not the cache read rate ($0.50/M). Since cache reads are 95% of all tokens in Claude Code, this means the advertised 90% cache discount effectively doesn’t apply to Max plan extra usage billing. My Setup Plan : Max 20x ($200/month) Usage : Almost exclusively Claude Code (terminal). Rarely use claude.ai web. Models : Claude Opus 4.5 and 4.6 (100% of my usage) Billing period analyzed : Feb 3-12, 2026 The Data Sources Source 1 — JSONL files : Claude Code stores every conversation as JSONL files in ~/.claude/projects/ . Each assistant response includes exact token counts: json { “type”: “assistant”, “timestamp”: “2026-02-09T…”, “requestId”: “req_011CX…”, “message”: { “model”: “claude-opus-4-6”, “usage”: { “input_tokens”: 10, “output_tokens”: 4, “cache_creation_input_tokens”: 35039, “cache_read_input_tokens”: 0 } } } My script scans all JSONL files, deduplicates by requestId (streaming chunks share the same ID), and sums token usage. No estimation — this is the actual data Claude Code recorded locally. Source 2 — Billing dashboard : My Anthropic billing page shows 206 individual charges from Feb 3-12, each between $5 and $29 (most are ~$10, suggesting a $10 billing threshold). Token Usage (from JSONL) 94.77% of all tokens are cache reads. This is normal for Claude Code — every prompt re-sends the full conversation history and system context, and most of it is served from the prompt cache. Note: The day-by-day table below totals 388.7M tokens (1.2M more) because the scan window captures a few requests at date boundaries. This 0.3% difference doesn’t affect the analysis — I use the conservative higher total for $/M calculations. Day-by-Day Cross-Reference Key observations:
Feb 6-7 : 1,181 API calls and 127M tokens with zero charges . These correspond to my weekly limit reset — the Max plan resets weekly usage limits, and these days fell within the refreshed quota. - Feb 11 : Only 72 API calls and 5.6M tokens, but $703 in charges (53 line items) . This is clearly billing lag — charges from earlier heavy usage days being processed later. - The per-day $/M rate varies wildly because charges don’t align 1:1 with the day they were incurred. But the overall rate converges to $6.21/M . What This Should Cost (Published API Rates) Opus 4.5/4.6 published pricing: The Discrepancy Reverse-Engineering the Rate If I divide total billed ($2,413.25) by total tokens (388.7M): $2,413.25 ÷ 388.7M = $6.21 per million tokens The blended rate across all my tokens is $6.21/M — within 1% of the cache creation rate . Scenario Testing I tested multiple billing hypotheses against my actual charges: Best match : Billing all input-type tokens (input + cache creation + cache reads) at the 5-minute cache creation rate ($6.25/M). This produces $2,425 — within 0.5% of my actual $2,413. Alternative Explanations I Ruled Out Before concluding this is a cache-read billing issue, I checked every other pricing multiplier that could explain the gap: Long context pricing (>200K tokens = 2x rates) : I checked every request in my JSONL files. The maximum input tokens on any single request was ~174K. Zero requests exceed the 200K threshold. Long context pricing does not apply. Data residency pricing (1.1x for US-only inference) : I’m not on a data residency plan, and data residency is an enterprise feature that doesn’t apply to Max consumer plans. Batch vs. real-time pricing : All Claude Code usage is real-time (interactive). Batch API pricing (50% discount) is only for async batch jobs. Model misidentification : I verified all requests in JSONL are claude-opus-4-5-* or claude-opus-4-6 . Opus 4.5/4.6 pricing is $5/$25/M (not the older Opus 4.0/4.1 at $15/$75/M). Service tier : Standard tier, no premium pricing applies. None of these explain the gap. The only hypothesis that matches my actual billing within 0.5% is: cache reads billed at the cache creation rate . What Anthropic’s Own Docs Say Anthropic’s Max plan page states that extra usage is billed at “standard API rates” . The API pricing page lists differentiated rates for cache reads ($0.50/M for Opus) vs cache writes ($6.25/M). Anthropic’s own Python SDK calculates costs using these differentiated rates. The token counting cookbook explicitly shows cache reads as a separate, cheaper category. There is no published documentation stating that extra usage billing treats cache reads differently from API billing. If it does, that’s an undisclosed pricing change. What This Means The 90% cache read discount ($0.50/M vs $5.00/M input) is a core part of Anthropic’s published pricing. It’s what makes prompt caching economically attractive. But for Max plan extra usage, my data suggests all input-type tokens are billed at approximately the same rate — the cache creation rate. Since cache reads are 95% of Claude Code’s token volume, this effectively multiplies the real cost by ~8x compared to what published pricing would suggest. My Total February Spend My billing dashboard shows $2,505.51 in total extra usage charges for February (the $2,413.25 above is just the charges I could itemize from Feb 3-12 — there are likely additional charges from Feb 1-2 and Feb 13+ not shown in my extract). Charge Pattern 205 of 206 charges are $10 or more 69 charges fall in the $10.00-$10.50 range (the most common bucket) Average charge: $11.71 This suggests a $10 billing threshold — usage accumulates until it hits ~$10, then fires a charge Caveats JSONL files only capture Claude Code usage , not claude.ai web. I rarely use web, but some billing could be from there. Billing lag exists — charges don’t align 1:1 with the day usage occurred. The overall total is what matters, not per-day rates. Weekly limit resets explain zero-charge days — Feb 6-7 had 127M tokens with zero charges because my weekly usage limit had just reset. The $2,413 is for usage that exceeded the weekly quota. Anthropic hasn’t published how extra usage billing maps to token types. It’s possible billing all input tokens uniformly is intentional policy, not a bug. JSONL data is what Claude Code writes locally — I’m assuming it matches server-side records. Questions for Anthropic Are cache read tokens billed at $0.50/M or $6.25/M for extra usage? The published pricing page shows $0.50/M, but my data shows ~$6.21/M. Why isn’t the extra usage billing formula published? Users need to know the real cost before they accumulate $2,500+ in charges. Can the billing dashboard show per-token-type breakdowns? Right now it just shows dollar amounts with no token detail. Is the subscription quota consuming the cheap cache reads first, leaving expensive tokens for extra usage? If quota credits are applied to cache reads at $0.50/M, that would use very few quota credits per read, pushing most reads into extra-usage territory. Related Issues GitHub #22435 — Inconsistent quota burn rates, opaque billing formula GitHub #24727 — Max 20x user charged extra usage while dashboard showed 73% quota used GitHub #24335 — Usage tracking discrepancies How to Audit Your Own Usage I built attnroute , a Claude Code hook with a BurnRate plugin that scans your local JSONL files and computes exactly this kind of audit. Install it and run the billing audit: bash pip install attnroute
plugin = BurnRatePlugin() audit = plugin.get_billing_audit(days=14) print(plugin.format_billing_audit(audit)) ```
This gives you a full breakdown: all four token types with percentages, cost at published API rates, a "what if cache reads are billed at creation rate" scenario, and a daily breakdown with cache read percentages. Compare the published-rate total against your billing dashboard — if your dashboard charges are closer to the flat-rate scenario than the published-rate estimate, you're likely seeing the same issue.
attnroute also does real-time rate limit tracking (5h sliding window with burn rate and ETA), per-project/per-model cost attribution, and full historical usage reports. It's the billing visibility that should be built into Claude Code.
Edit
: I'm not claiming fraud. This could be an intentional billing model where all input tokens are treated uniformly, a system bug, or something I'm misunderstanding about how cache tiers work internally. But the published pricing creates a clear expectation that cache reads cost $0.50/M (90% cheaper than input), and Max plan users appear to be paying $6.25/M. Whether intentional or not, that's a
12.5x gap on 95% of your tokens
that needs to be explained publicly.
If you're a Max plan user with extra usage charges
, I'd recommend: 1. Install
attnroute
and run
get_billing_audit()
to audit your own token usage against published rates 2. Contact Anthropic support with your findings — reference that their docs say extra usage is billed at "standard API rates" which should include the $0.50/M cache read rate 3. File a billing dispute if your numbers show the same pattern
(Tip:Just have claude run the audit for you with attnroute burnrate plugin.)
submitted by
/u/jcmguy96
Originally posted by u/jcmguy96 on r/ClaudeCode
