I’m not seeing this comparison anywhere — curious if others have data. The variables everyone debates:
- Model choice (Opus vs Sonnet vs GPT-4o etc.) - Effort level (low / medium / high) - Extended thinking on vs off The variable nobody seems to measure:
- Number of human iterations (back-and-forth turns to reach acceptable output) What I’ve actually observed: AI almost never gets complex tasks right on the first pass. Basic synthesis from specific sources? Fine. But anything where you’re genuinely delegating thinking — not just retrieval — the first response lands somewhere between “in the ballpark” and “completely off.” Then you go back and forth 2-3 times. That’s when it gets magical. Not because the model got smarter. Because you refined the intent, and the model got closer to what you actually meant. The metric I think matters most: end-to-end time Not LLM processing time. The full elapsed time from your first message to when you close the conversation and move on. If I run Opus at medium effort, no extended thinking, and go back-and-forth twice — I’m often done before high-effort extended thinking returns its first response on a comparable task. And then I still have to correct that first response. It’s never final. My current default: Opus or Sonnet at medium, no extended thinking. Research actually suggests extended thinking can make outputs worse in some cases (not just slower). But even setting that aside — if the first response always needs refinement anyway, front-loading LLM “thinking time” seems like optimizing the wrong thing. The comparison I’d want to see properly mapped: Has anyone actually run this comparison? Or found research that does? I keep seeing threads about “which model wins” and “does extended thinking help” — but the human-in-the-loop variable seems chronically underweighted in the conversation. Full source: github.com/jonathanmalkin/jules Building AI systems for communities mainstream tech ignores. submitted by /u/jonathanmalkin
Originally posted by u/jonathanmalkin on r/ClaudeCode
You must log in or # to comment.
