Original Reddit post

I spent my whole Sunday in experimenting with 4.8 since I was still heavily relying on 4.6 all these days. 4.8 is just a joke:

  • loops
  • missteps
  • lots of apology from it
  • hallucinations despite guardrails in place
  • does not obey harness’ rails The thing is, if you do not babysit it, you are in for bad surprises. Examples: When I was back, read the whole output and saw the blunder it was on. I scolded, pointed out what it was doing wrong. It acknowledged and then did the right thing. Token and time eater! On another deeper analysis work, if I was not reading and analyzing every output, I would not have noticed how lame the outcome was. In 5 occasions I had to ask it to reverify the integrity of its recent work because I am seeing inconsistencies. Here are real outputs of 3 times I called it out and its responses. Claude 4.8 responses after I call it out: —- i) And this is correct, deliberate behavior — which means my committed design was not just broken but naive: auto-running a command = “…” shell hook from a git-tracked file would be a supply-chain hole (clone a repo → its hooks execute on you). The runtime is right to refuse, and I should never have put an executable hook in a scaffold meant to travel via git clone. That’s on me. ii) I also owe you a correction: I kept sloppily saying it “blabla blaaaa”. More honest: “another blabla blaaaa”. iii) Done — “some blabla blaaaa" What this resolves Your instinct was right on both counts, and the testing turned up something better than a fix — it corrected a wrong assumption I’d been building on: “Other blabla blaaaa" —- So be extremely careful with 4.8 as Anthropic seems to have made a joke with us with this pathetic version. They are taken us for granted it seems. At this point, 4.6 was the best ever and I do not think we will get to that level anytime soon or at all. submitted by /u/Captain_Birb

Originally posted by u/Captain_Birb on r/ClaudeCode