Maybe I’m in the minority, but for my use cases Opus 4.6 is still better than 4.8, just like it was clearly wayyy better than 4.7. I keep seeing people focus on coding benchmarks and code generation quality, but that’s honestly not what I care about most. What matters to me is reliability, instruction following, consistency, and whether the model actually sticks to my preferences over a long conversation. With 4.6, when I tell it something, it actually remembers the spirit of what I asked for and keeps applying it. It follows instructions more predictably, stays aligned with the requested style, and generally requires less correction. The responses feel more dependable. 4.7 was a complete mess in tgat regard and a clear regression. It always ended up doing funky stuff. 4.8 is technically impressive, especially for coding, but I still find myself getting frustrated when it ignores preferences and drifts from instructions, or starts optimizing for what it thinks I want instead of what I actually asked for. Curious if anyone else feels the same, or if I’m just weirdly sensitive to instruction-following and consistency compared to raw capability. submitted by /u/hatekhyr
Originally posted by u/hatekhyr on r/ClaudeCode
