eifachposte

eifachposte

We set effort=low expecting roughly the same behavior as OpenAI’s reasoning.effort=low or Gemini’s thinking_level=low , but with effort=low , Opus 4.6 not only thought less, but it acted lazier. It made fewer tool calls, was less thorough in its cross-referencing, and we even found it effectively ignoring parts of our system prompt telling it how to do web research. (trace examples/full details: https://everyrow.io/blog/claude-effort-parameter.) ) Our agents were returning confidently wrong answers because they just stopped looking. Bumping to effort=medium fixed it. And in Anthropic’s defense, this is documented. I just didn’t read carefully enough before kicking off our evals. So while it’s not a bug, since Anthropic’s effort parameter is intentionally broader than other providers’ equivalents (controls general behavioral effort, not just reasoning depth), it does mean you can’t treat effort as a drop-in for reasoning.effort or thinking_level if you’re working across providers. Do you think reasoning and behavioral effort should be separate knobs, or is bundling them the right call? submitted by /u/ddp26

Originally posted by u/ddp26 on r/ClaudeCode

Opus 4.6 effort=low returned confidently wrong answers because agents just stopped looking

Opus 4.6 effort=low returned confidently wrong answers because agents just stopped looking