eifachposte

eifachposte

I want to start a serious thread about repeated Claude Code and Opus quality regressions without turning this into another useless fight between “skill issue” and “conspiracy.” My position is narrow, evidence-based, and I think difficult to dismiss honestly. First, there is a difference between these three claims: Users have repeatedly observed abrupt quality regressions. At least some of those regressions were real service-side issues rather than just user error. The exact mechanism was intentional compute-saving behavior such as heavier quantization, routing changes, fallback behavior, or something similar. I think claim 1 is clearly true. I think claim 2 is strongly supported. I think claim 3 is plausible, technically serious, and worth discussing, but not conclusively proven in public. That distinction matters because people in this sub keep trying to refute claim 3 as if that somehow disproves claims 1 and 2. It does not. There have been repeated user reports over time describing abrupt drops in Claude Code quality, not just isolated complaints from one person on one bad day. A widely upvoted “Open Letter to Anthropic” thread described a “precipitous drop off in quality” and said the issue was severe enough to make users consider abandoning the platform. Source: https://www.reddit.com/r/ClaudeCode/comments/1m5h7oy/open_letter_to_anthropic_last_ditch_attempt/ Another discussion explicitly referred to “that one week in late August 2025 where Opus went to shit without errors,” which is notable because even a generally positive user was acknowledging a distinct bad period. Source: https://www.reddit.com/r/ClaudeCode/comments/1nac5lx/am_i_the_only_nonvibe_coder_who_still_thinks_cc/ More recent threads show the same pattern continuing, with users saying it is not merely that the model is “dumber,” but that it is adhering to instructions less reliably in the same repo and workflow. Source: https://www.reddit.com/r/ClaudeCode/comments/1rxkds8/im_going_to_get_downvoted_but_claude_has_never/ So no, this is not just one angry OP anthropomorphizing. The repeated pattern itself is already established well enough to be discussed seriously. More importantly, Anthropic itself later published a postmortem stating that between August and early September 2025, three infrastructure bugs intermittently degraded Claude’s response quality. That is a direct company acknowledgment that at least part of the degradation users were complaining about was real and service-side. This is the key point that should end the lazy “it was all just user error” dismissal. Source: https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues Anthropic also said in that postmortem that they do not reduce model quality due to demand, time of day, or server load. That statement is relevant, and anyone trying to be fair should include it. At the same time, that does not erase the larger lesson, which is that user reports of degraded quality were not imaginary. They were, at least in part, tracking real problems in the system. There is another reason the “just prompt better” response is inadequate. Claude Code’s own changelog shows fixes for token estimation over-counting that caused premature context compaction. In plain English, there were product-side defects that could make the system compress or mishandle context earlier than it should, which is exactly the kind of thing users would experience as sudden “lobotomy,” laziness, forgetfulness, shallow planning, or loss of continuity. Source: https://code.claude.com/docs/en/changelog Recent bug reports also describe context limit and token calculation mismatches that appear consistent with premature compaction and context accounting problems. Source: https://github.com/anthropics/claude-code/issues/23372 This means several things can be true at the same time:

A bad prompt can hurt results.
A huge context can hurt results.
A messy repo can hurt results.
And the platform itself can also have real regressions that degrade output quality. These are not mutually exclusive explanations. The constant Reddit move of taking one generally true point such as “LLMs are nondeterministic” or “context matters” and using it to dismiss repeated time-clustered regressions is not serious analysis. It is rhetorical deflection. Now to the harder question, which is mechanism. Is it technically plausible that a model provider with finite compute could alter serving characteristics during periods of constraint, whether through quantization, routing, batching, fallback behavior, more aggressive context handling, or other inference-time tradeoffs? Obviously yes. This is not some absurd idea. Serving large models is a constrained optimization problem, and lower precision inference is a standard throughput and memory lever in modern LLM serving stacks. Public inference systems such as vLLM explicitly document FP8 quantization support in that context. So the general hypothesis that capacity pressure could change serving behavior is not delusional. It is technically normal to discuss. Source: https://docs.vllm.ai/en/stable/features/quantization/fp8/ But this is the part where I want to stay disciplined. The public record currently supports “real service-side regressions” more strongly than it supports “Anthropic intentionally served a more degraded version of the model to save compute.” Anthropic’s postmortem points directly to infrastructure bugs for the August to early September 2025 degradation window. Their product docs and bug history also point to context-management and compaction-related issues that could independently explain a lot of the user experience. That does not make compute-saving hypotheses impossible. It just means that the strongest public evidence currently lands at “real regressions happened,” not yet at “we can publicly prove the exact internal cost-saving mechanism.” So the practical conclusion is this: It is completely legitimate to say that repeated quality regressions in Claude Code and Opus were real, that users were not imagining them, and that “skill issue” is not an adequate blanket response. That much is already supported by user reports plus Anthropic’s own acknowledgment of intermittent response quality degradation. It is also legitimate to discuss compute allocation, serving tradeoffs, routing, fallback behavior, and quantization as serious possible mechanisms, because those are normal engineering levers in large-scale model serving. But we should be honest that, in public, that remains a mechanism hypothesis rather than something fully demonstrated in Anthropic’s case. What I do not find credible anymore is the reflexive Reddit response that every report of degradation can be dismissed with one of the following:
“bad prompt”
“too much context”
“your repo sucks”
“LLMs are nondeterministic”
“you are coping”
“you are anthropomorphizing” Those can all be relevant in individual cases. None of them, by themselves, explain repeated independent reports, clustered time windows, official acknowledgments of degraded response quality, or product-side fixes related to context handling. If people want this thread to be useful instead of tribal, I think the right way to respond is with concrete reports in a structured format:
Approximate date or time window
Model and product used
Task type
Whether context size was unusually large
What behavior had been working before
What behavior changed
Whether switching model, restarting, or reducing context changed the result That would produce an actual evidence base instead of the usual cycle where users report regressions, defenders deny the possibility on principle, and months later the company quietly confirms some underlying issue after the community has already spent weeks calling everyone delusional. Sources for anyone who wants to check rather than argue from instinct: Anthropic engineering postmortem on degraded response quality between August and early September 2025: https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues Anthropic Claude Code changelog including a fix for token estimation over-counting that prevented premature context compaction: https://code.claude.com/docs/en/changelog Reddit thread, “Open Letter to Anthropic,” describing a precipitous drop in Claude Code quality: https://www.reddit.com/r/ClaudeCode/comments/1m5h7oy/open_letter_to_anthropic_last_ditch_attempt/ Reddit thread acknowledging “that one week” in late August 2025 when Opus quality dropped badly: https://www.reddit.com/r/ClaudeCode/comments/1nac5lx/am_i_the_only_nonvibe_coder_who_still_thinks_cc/ Recent Reddit discussion saying the issue is degraded instruction adherence in the same repo and setup: https://www.reddit.com/r/ClaudeCode/comments/1rxkds8/im_going_to_get_downvoted_but_claude_has_never/ Recent bug report describing token accounting and premature context compaction problems: https://github.com/anthropics/claude-code/issues/23372 submitted by /u/No-Loss3366

Originally posted by u/No-Loss3366 on r/ClaudeCode

Claude Code and Opus quality regressions are a legitimate topic, and it is not enough to dismiss every report as prompting, repo quality, or user error

Claude Code and Opus quality regressions are a legitimate topic, and it is not enough to dismiss every report as prompting, repo quality, or user error