Original Reddit post

A team published how they run Claude Opus 4.6 and pay less than they did on Sonnet 4.0. The result comes from what Opus doesn’t do. A Haiku triager runs first - its only job is detecting whether a CI failure is a duplicate of something already seen. Four out of five failures never reach Opus. A Haiku triage call costs roughly 25x less than a full investigation. When Opus does run, it never reads raw data. It writes specific prompts for Haiku sub-agents: “fetch the exact error messages,” “check failure rate over the last 14 days.” Haiku handles 65% of all input tokens but only 36% of spend. Opus thinks; Haiku reads. Without the model hierarchy, their daily bill more than doubles. Does anyone run a similar tiered setup - and where’s the hardest part to tune? submitted by /u/jimmytoan

Originally posted by u/jimmytoan on r/ClaudeCode