So I’ve been running these massive agent loops through Claude Code on a side project. Refactors, doc rewrites, test generation, all of it. A typical run goes 700+ steps and for the longest time Opus 4.7 handled every single one. Bills were consistently around $100 per run. I kept telling myself “well the output is great” which is true but also what I tell myself about every expensive habit I refuse to examine. Anyway. The realization that finally got me to change things was embarrassingly obvious: most of those 700 steps are just… mundane. Read a file. Edit twenty lines. Call an endpoint. Run a test. Maybe 15% of steps involve actual hard thinking like architectural decisions or debugging something tangled across multiple modules. The rest is basically clerical work. I set up a router. Opus stays on for anything I tag as complex. Everything else gets sent to cheaper models. Tested DeepSeek V4 Pro and Hunyuan Hy3 preview on the routine tier, both worked. Hy3 is a 295B parameter MoE model from Tencent Hunyuan with only 21B active per token, so you get quality from a big expert pool but pay inference costs closer to a small model. Also OpenRouter rankings showed it at #1 by tool call volume shortly after launch, which felt like a decent real world signal that other agent setups were already leaning on it for exactly this kind of work. Napkin math on a real run: my loops burn roughly 7M input tokens and 2.5M output tokens. Opus 4.7 at $5 per million input and $25 per million output puts that at roughly $100 all in. Route 85% of those tokens through the cheap tier (Tencent Cloud TokenHub lists Hy3 at about $0.18 per million input, $0.59 per million output) and the bulk of the run costs a couple bucks. Opus 4.7 still handles the remaining 15% of hard steps for around $15. Total lands in the neighborhood of $17 to $20. I spent more time setting up the router than I saved on the first run, but it paid for itself by run three. Now the part where I have to be honest. I tried letting the cheap tier handle a gnarly state management bug that spanned four services and it just… didn’t get there. Took three extra iterations before I gave up and kicked it back to opus, which nailed it in one pass. The smaller active parameter count means these models miss some of the subtler cross file connections. For deep architectural reasoning or debugging across unfamiliar codebases, opus is still the only thing I trust. My routing rule is dumb but it works: if a step touches more than three files or needs architectural context, opus gets it. Everything else goes cheap. submitted by /u/Any-Farm-1033
Originally posted by u/Any-Farm-1033 on r/ClaudeCode
