Look I dont know what is going on behind there, I think quite a large number of folks here want to see the streaming CoT thinking as to better keep track of the LLM’s direction and to be able to control the token usage/wastage better. If you ARing without showing the actual proof behind the work and just output a single sentence summary I dont want to pay the API prices and rather go with Codex which is too generous with their limits if it meeans I have to waste 2x more tokens to get to the same place! You have truly gone from “best” to one of the many… Additionally not showing the CoT assumes the LLM is truly achieved the global minima in your (I assume GQA) model which is incorrect. Unless you can prove this I think showing what the LLM is “thinking” is quite critical if we say it has sufficiently reached “acceptable” minima. Developing safety critical or financial software/questions requires absolute determinism which you cannot guarantee without global minima… For now human intervention is the only way to stop AR models going in circles and wasting tokens which I feel like you should care since usage is an issue isn’t it? Anthropic pls fix thx! submitted by /u/smashedshanky
Originally posted by u/smashedshanky on r/ClaudeCode
