eifachposte

eifachposte

Before you state the obvious - no I didn’t need to use Opus 4.8 for this whole thing. It was an accident, I had been experimenting and didn’t change the model before switching focus. I’ve been working on a feature and realised the cost was going up and up so once it was done I got a fresh session to analyse why it cost so much and this is what it said:

Here's the breakdown: --- Session 239ad469 — $1,555 total Session span: 28 hours (2026-06-02 08:51 → 2026-06-03 12:55), 1,148 turns Cost breakdown ┌──────────────────┬────────┬──────┐ │ Category │ Tokens │ Cost │ ├──────────────────┼────────┼──────┤ │ Cache creation │ 31.2M │ $584 │ ├──────────────────┼────────┼──────┤ │ Cache reads │ 554M │ $832 │ ├──────────────────┼────────┼──────┤ │ Output │ 1.73M │ $130 │ ├──────────────────┼────────┼──────┤ │ Input (uncached) │ 307K │ $5 │ └──────────────────┴────────┴──────┘ Why so expensive Root cause: 28-hour session on Opus 4-8. Cache TTL is 5 minutes. Every time you walked away, cache expired. On return: 31M+ tokens recreated at $18.75/MTok (cache write rate). This happened repeatedly over the day. - $584 in cache creates = ~1,660 full cache rebuilds of ~18K tokens each, or fewer rebuilds of a larger context - Opus 4-8 for 1,148 turns — Sonnet would be $3–5× cheaper for most of those - 46 subagents but only $19 — that part was fine Fixes 1. /compact before stepping away — reduces context size, so cache rebuilds cost less 2. End session + start fresh after major pauses — don't leave Opus sessions open overnight 3. Use Sonnet by default — switch to Opus only for complex reasoningHere's the breakdown: --- Session 239ad469 — $1,555 total Session span: 28 hours (2026-06-02 08:51 → 2026-06-03 12:55), 1,148 turns Cost breakdown ┌──────────────────┬────────┬──────┐ │ Category │ Tokens │ Cost │ ├──────────────────┼────────┼──────┤ │ Cache creation │ 31.2M │ $584 │ ├──────────────────┼────────┼──────┤ │ Cache reads │ 554M │ $832 │ ├──────────────────┼────────┼──────┤ │ Output │ 1.73M │ $130 │ ├──────────────────┼────────┼──────┤ │ Input (uncached) │ 307K │ $5 │ └──────────────────┴────────┴──────┘ Why so expensive Root cause: 28-hour session on Opus 4-8. Cache TTL is 5 minutes. Every time you walked away, cache expired. On return: 31M+ tokens recreated at $18.75/MTok (cache write rate). This happened repeatedly over the day. - $584 in cache creates = ~1,660 full cache rebuilds of ~18K tokens each, or fewer rebuilds of a larger context - Opus 4-8 for 1,148 turns — Sonnet would be $3–5× cheaper for most of those - 46 subagents but only $19 — that part was fine Fixes 1. /compact before stepping away — reduces context size, so cache rebuilds cost less 2. End session + start fresh after major pauses — don't leave Opus sessions open overnight 3. Use Sonnet by default — switch to Opus only for complex reasoning ```
I've done similar analyses of sessions before and the problem is ALWAYS the cache TTL. This is an extreme example it's true, but it's a consistent problem I have.
Any suggestions of how to handle this better? Obviously "don't leave a session overnight" (I didn't realise I'd done that) but anything else? Other than just having one session running at once and watching it like a hawk...
submitted by
/u/niftyshellsuit

Originally posted by u/niftyshellsuit on r/ClaudeCode

Cache TTL is killing me

Cache TTL is killing me