Original Reddit post

Like a lot of people here, usage limit changes pushed me to audit where my Claude Code context goes. The question I’m asking: what am I sending the agent that it doesn’t use? MCPs that aren’t needed for a project. Bloated PRDs and design docs. Excessively long skills/sub-agents/slash commands. CLAUDE.md that grew endlessly. Most need judgment to trim safely without impacting behavior. Test runner output is different. It’s always the same shape, and the agent only needs a small piece: did the tests pass, and if not, which failed and where? Thesis: trimming test runner output to only what is useful will reduce token bloat. On one of my projects (691 tests across 34 files), a clean run prints 62 lines of vitest output. The agent only needs “did they all pass.” When tests fail, each failure adds another 25 to 30 lines of error info, most of it vitest junk the agent doesn’t use. I built a small CLI wrapper that runs the test command and strips it down to just what matters and made it available via MCP. Same exit code, same failures preserved. On that same personal project those 62 lines collapse to one: “691 tests passed in 7.09s.” Here’s what it does on some common OSS projects: tinybench 139 → 24 (82.7%) pathe 194 → 24 (87.6%) destr 271 → 24 (91.1%) mlly 991 → 128 (87.1%) hono 28,213 → 662 (97.7%) vue 134,301 → 11,677 (91.3%) nuxt 131,866 → 64 (99.95%)*

  • Nuxt’s build was broken when I ran this, so vitest dumped 70+ stack traces (one per suite that couldn’t load). crux collapses those into a single “build broken, fix that first” line. Different shape of compression than the others, but a big win all the same. Median ~89% reduction and the test suites run at the same speed. Is it going to magically fix Claude’s usage limits? No. But tokens saved are tokens saved and this adds no overhead. Notes: I used Anthropic’s /v1/messages/count_tokens against claude-opus-4-7 for the table. Stripped ANSI from both inputs before counting. Sonnet 4.6 ratios match within 1 percentage point on every target. Repo: https://github.com/slgoodrich/crux-cli Full benchmark and methodology: https://github.com/slgoodrich/crux-cli/blob/main/docs/report-v0.1.md Scope: vitest only today. Jest next, then pytest, cargo test, and go test. Curious what other noisy inputs folks have spotted and what you’ve tried. I’m kicking around some other ideas to test. submitted by /u/wewerecreaturres

Originally posted by u/wewerecreaturres on r/ClaudeCode