Ok, obviously Opus is better, but the differences just surprised me. I have a prompt I run multiple times a day to help me test my app. It requires running multiples commands on MacOS, grabbing screenshots, and making sure the app still works. This is a native app, so I can’t automate testing like you might normally do with playwright on a browser app. Anyways, I’d started the test and it kept getting confused after grabbing the screenshot and making bad decisions. I couldn’t figure out why it was happening and it seemed like maybe the context was full but I knew it wasn’t. I thought maybe there it was a bug in my app. Then I checked the model. It was running sonnet 4.5. As soon as I changed it to opus 4.6, it figured it out the problem immediately and proceeded like a smart person, not a confused child. Sometimes it’s hard to tell how much better one model is doing from another, but in this case the difference was super clear. Just wanted to share, I’m curious if others are experiencing the same? That is, sometimes it’s hard to tell the difference between the models and other times, only the latest model can solve your problem. This makes me extra excited for the next release. What will the next Opus release do that Opus 4.6 can’t, and will it feel as magical as this just did? submitted by /u/timc-trainean
Originally posted by u/timc-trainean on r/ClaudeCode
