Original Reddit post

Wanted to check in with the community to see if this is just a placebo effect or if anyone else is tracking noticeable shifts in their workflows. Over the last couple of weeks—right around the time Claude Opus 4.8 dropped—it feels like the stability of older models has changed. Tasks that previously executed flawlessly in a single prompt are now frequently hitting walls: Increased Hallucinations: Confident but objectively incorrect outputs on established tasks. Logic Loops: Getting stuck in circular reasoning or defaulting to passive-aggressive apologies rather than troubleshooting. Context Bleeding: Dropping instructions mid-conversation that older builds used to hold perfectly. When doing a vertical comparison against other models in production right now (like GPT-5.5), the performance degradation feels highly localized to these older Claude endpoints. If the underlying weights weren’t touched, it makes me wonder if backend compute allocations or hidden system prompts have shifted. What is your experience so far? Are you seeing this in your workflows? If you are running automated pipelines or heavy chat sessions on older models like4.6, has your error rate ticked up recently? What’s the likely technical cause? Is this typical behavior when a lab optimizes infrastructure for a new flagship rollout (like 4.8’s adaptive thinking), or is it more likely a safety patch over-correction? How are you adjusting? Have you had to rewrite your prompt frameworks to compensate, or are you starting to look at alternative models for your stable pipelines? Curious to hear if anyone has run actual eval data on this over the last month or if it’s just a localized fluke. submitted by /u/qilipu

Originally posted by u/qilipu on r/ClaudeCode

  • iconic_admin@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 hours ago

    Sonnet has been noticeably dumber in my estimation. Like it’s not even trying anymore. It gets easy things wrong. It makes up things all the time. It refuses to search and check itself unless explicitly asked to and it just goes with “it’s probably x” which is absolutely incorrect.