Original Reddit post

In a long session every message pays for the whole history again: at 300k tokens of context even an “ok” costs 300k input. That is how limits vanish in minutes. Claude Code has two compactions, and they are not the same thing. The system one, the emergency, is always there but only fires at the model’s ceiling: on the 1M models I saw it kick in at 997,785 and 1,005,332 tokens, that is after you have already paid full price for hundreds of thousands of tokens. The preventive one, threshold based, summarizes the session much earlier and restarts light: but it only arms if something declares the context window. Out of the box that only happens on some 200k models (in my tests Sonnet 4.6 yes, Haiku 4.5 never); otherwise you declare the window yourself with an env var, or the desktop app declares it when it creates a new chat. The 1M sessions get no default at all. I wanted the preventive one as early as possible, to pay less context: I was moving the threshold up front with the env vars ( CLAUDE_AUTOCOMPACT_PCT_OVERRIDE , an undocumented test knob that brings the percentage forward, and CLAUDE_CODE_AUTO_COMPACT_WINDOW for the window). That is where I saw what happens when it is not armed: it never fires at any threshold, default included, and (on the desktop app, where I measured it) the blocking-limit warning never shows up either. Nothing tells you that you are running without a seatbelt: the status line keeps showing “X% left until auto-compact” as if everything worked. The worst case is the desktop app with the 1M models (Fable 5 and Opus 4.8, which is where I used them): new chats are protected because the app declares the window, but resume that same chat after a restart or an update and the app stops declaring it: the preventive compaction turns off and never comes back on its own. Three of my sessions resumed like that, at 280k, 308k and 652k tokens, never compacted once: every message paid for the full context again. Two were saved only by the emergency at the ceiling, the numbers above. Full repro in issue 67806 (Claude Code 2.1.175 CLI and 2.1.170 desktop). The 30 second fix : add CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000 to the env block of ~/.claude/settings.json . It is read at runtime: two of my already disarmed chats compacted on their very next message (at 156,781 and 200,248 tokens), nothing to restart. And it is safe with smaller models: requested on Haiku, the log shows effectiveWindow=180000 , so the value gets clamped to the model’s real window. On its own though, it only reconnects the brake: the default threshold still sits near the top of the window, so on a 1M model you keep riding huge contexts. To actually compact early, pair it with CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=25 (fires around 250k on a 1M model; pick whatever percentage fits your work). One var connects the brake, the other decides how early it brakes. The percentage alone, in the disarmed cases, does nothing. To check if this affects you, from a terminal, resuming a session that already has some weight (tens of thousands of tokens or more: the check works off the real usage of the last turn, so on a near-empty session zero lines prove nothing): claude -p “hi” --resume <session-id> --debug-file=/tmp/a.log grep -c “autocompact:” /tmp/a.log Zero lines = the preventive check is not running. One caveat: if you already have CLAUDE_CODE_AUTO_COMPACT_WINDOW or CLAUDE_AUTOCOMPACT_PCT_OVERRIDE set in your environment or env block, the test is void: the env var is protecting you. Curious how many people have this off without knowing: post your model and what you find in the log. submitted by /u/DebateStreet2281

Originally posted by u/DebateStreet2281 on r/ClaudeCode