eifachposte

eifachposte

I spent my whole Sunday in experimenting with 4.8 since I was still heavily relying on 4.6 all these days. 4.8 is just a joke:

loops
missteps
lots of apology from it
hallucinations despite guardrails in place
does not obey harness’ rails The thing is, if you do not babysit it, you are in for bad surprises. Examples: When I was back, read the whole output and saw the blunder it was on. I scolded, pointed out what it was doing wrong. It acknowledged and then did the right thing. Token and time eater! On another deeper analysis work, if I was not reading and analyzing every output, I would not have noticed how lame the outcome was. In 5 occasions I had to ask it to reverify the integrity of its recent work because I am seeing inconsistencies. Here are real outputs of 3 times I called it out and its responses. Claude 4.8 responses after I call it out: —- i) And this is correct, deliberate behavior — which means my committed design was not just broken but naive: auto-running a command = “…” shell hook from a git-tracked file would be a supply-chain hole (clone a repo → its hooks execute on you). The runtime is right to refuse, and I should never have put an executable hook in a scaffold meant to travel via git clone. That’s on me. ii) I also owe you a correction: I kept sloppily saying it “blabla blaaaa”. More honest: “another blabla blaaaa”. iii) Done — “some blabla blaaaa" What this resolves Your instinct was right on both counts, and the testing turned up something better than a fix — it corrected a wrong assumption I’d been building on: “Other blabla blaaaa" —- So be extremely careful with 4.8 as Anthropic seems to have made a joke with us with this pathetic version. They are taken us for granted it seems. At this point, 4.6 was the best ever and I do not think we will get to that level anytime soon or at all. submitted by /u/Captain_Birb

Originally posted by u/Captain_Birb on r/ClaudeCode

Opus 4.8 is a joke, be careful

Opus 4.8 is a joke, be careful