Original Reddit post

A problem. I run a lot of projects, by some accounts several times more than colleagues who hit their limits faster than I do, all on a $200 Max plan (I used to run a $20 Codex plan alongside it too, talking to my Claude environment). On the API the same work would’ve cost me something like $10,000. So I got slightly obsessed with where the tokens actually go, mostly because I really don’t like bumping into the daily and weekly limits. I’ll go first. The thing that got me: every turn re-sends the whole conversation again. So if you’re 20 turns into a session and you fire off message 21, everything you’ve ever sent in that session goes back up with it. There’s a 5-minute cache that saves you if your next message lands inside the window, but if you work the way I used to, one long session with gaps between messages, it almost never kicks in. I worked out I was re-sending something like 20MB of text every time I hit enter, for basically no value. Around 95% of my tokens turned out to be pure - bascially useless ‘context’ - replay. So my quest. Killing recurring bash errors at the source with hooks so they stop replaying, and building a self-learning system that handles new errors as they show up. A status-line nudge to keep me honest about session length (it’s on GitHub, and it’s very handy m- based on the behavioral science of nudges, even to nudge me). Tightened my handover flow so picking a thread back up doesn’t drag the whole history along. Moved reference data out of loose files into a database, cut unused skills, switched off idle MCP servers. All of it quietly costs you just by being there. None of it is about doing less. Same output, honestly better, just without the bleed. So that’s mine. What about you? What have you actually done to cut the waste? Not “be concise”. The obsessive stuff you only found by digging into your own usage. submitted by /u/marksterberlin

Originally posted by u/marksterberlin on r/ClaudeCode