eifachposte

eifachposte

TL;DR: Claude Code caches your prompts as you go. When continuing an existing conversation, the previous part of your prompt that is already cached is billed only at 10% of the full cost. By default, Claude Code in billed-per-token setups sets a prompt cache TTL of 5 mins. This means that if you take longer than 5 mins to continue a Claude Code session, you’ll pay full price for the whole conversation every turn. The time of being more conscious of our token usage is upon us 🙌 So I went down a rabbit hole to figure out how to best make use of Claude Code’s prompt prefix caching mechanism. Here’s what I came up with. If you’re interested, the full official docs are here and are very good and detailed How the cache works Prompt caching is a prefix cache . Every turn, the API matches the start of your request (model + system prompt + project context + full convo history) against what has recently been cached, and only the newly appended bit of the conversation is fresh work. A cache write is when Claude Code commits the current conversation up to that point to be cached for a certain TTL ( time to live ): 5 mins or 1 hour depending on auth type or configuration. If following turns in a Claude Code session start with that exact prompt “prefix”, then that cache is used and that part of the conversation is billed at a highly discounted rate. Change anything earlier in that prefix and you’ll get a cache miss. Everything will be re-read (or re-committed as a cache) and you’ll be billed for the whole context again . Cached prefixes expire after inactivity, but every cache hit resets the TTL , so an active session stays available as cache. Cache pricing (relative to base input price) Cache read = ~0.1x (10%) Cache write (5m TTL) = 1.25x Cache write (1h TTL) = 2x Default cache TTL depends on how you auth On a Claude subscription (personal pro/max accounts for example), the main conversation auto-uses the 1h TTL at no extra cost. It drops to 5m only if you’re over your plan limit on usage credits. On an enterprise billed-per-token/API key / Bedrock / Vertex setup, default is 5m, because the 1h TTL cache is more expensive upfront. You can override the cache TTL manually with ENABLE_PROMPT_CACHING_1H=1 or FORCE_PROMPT_CACHING_5M=1 . Subagents always use 5m, even on a subscription. The cost breakdown: hits vs. misses To visualize the cost impact of caching, let’s take an imaginary example: a 3,000 token base prompt, followed by 5 conversational rounds adding 1,000 tokens each . The math: On a cache hit: You pay the 10% read rate for the accumulated context, plus the write premium (1.25x for 5m, 2x for 1h) only for the 1k new tokens. On a cache miss: The window expired. You pay the write premium to re-cache the entire context from scratch. Here is the total token cost for the entire 5-round session compared to a non-cached baseline: Some takeaways and tips The most cost effective workflow is to target always hitting the 5 min windows for long running tasks and sessions. If you can’t consistently (meetings, context switching, multitasking), consider switching to 1h TTL but make sure to take advantage of those cache windows, otherwise you’ll end up spending more. This makes me think that multitasking makes it pretty hard to hit these caches effectively with the 5min TTL. If you’re planning to take a break but want to continue the session later on, consider either: Running /compact while the cache is still warm before going on a break. Telling Claude to “manually” persist and compact the session into files a new fresh session can pick from scratch. Corollary to the previous point: There is no point, from a cost perspective, in running /compact on a previous long session after it already went out of cache. It’ll cost more than just continuing from where it left. Be careful with changes mid-session to some settings like model type, effort level, plugins or MCPs. Some of them might invalidate the cache because they’ll change something in Claude’s internal system prompt. Check the official docs for specific details about this. submitted by /u/jomi-se

Originally posted by u/jomi-se on r/ClaudeCode

How prompt caching works in Claude Code (and how to stop wasting tokens)

How prompt caching works in Claude Code (and how to stop wasting tokens)