I’ve been seeing a lot of people on Reddit saying their Claude Max 5x or 20x quotas are getting burned way faster than before. Honestly, a big part of this is expected. Sonnet 4.6 and Opus 4.6 are simply heavier models than previous versions. They think more, they write more, they consume more tokens. That alone already increases usage. But the real problem is not the model. It’s how people are choosing to use it. Many users are treating Opus like their default tool. They use it for simple implementations, small refactors, basic code reviews, quick explanations. Of course the quota will vanish fast if you do that. Sonnet today is extremely capable. If you give it a clear spec and well-defined requirements, it can handle the vast majority of real development tasks. Think of Sonnet as a strong senior developer. Solid judgment. Great delivery. Fast enough. Cheap enough. Opus should be treated more like a specialist. You bring it in when things get truly complex. Deep architectural decisions. Hard debugging sessions. Very large system design. Situations where Sonnet genuinely struggles. Another silent token killer is over-automation. Some people configure tons of subagents. They get triggered all the time, even when they add little value. Every invocation adds hidden token costs. The same happens with massive CLAUDE.md files. I’ve seen setups with 200+ lines of global context. That entire block keeps getting injected again and again. Tokens get drained before the real work even starts. If you want your quota to last longer, the mindset needs to change. Use Sonnet by default. Escalate to Opus only when necessary. Keep subagents lean and intentional. Trim global context to what actually matters. The model is not wasting your quota. Most of the time, your workflow is. submitted by /u/Then_Shallot3226
Originally posted by u/Then_Shallot3226 on r/ClaudeCode
