https://preview.redd.it/wbd918euwf7h1.png?width=1200&format=png&auto=webp&s=762d8ded1702ec357ba206f1059374ea999c9d0d Anthropic is pushing back on claims that its new Claude Fable 5 model was jailbroken within a day of its June 9 launch. A researcher known as Pliny the Liberator says he bypassed the safety layer and pulled the model’s roughly 120,000-character system prompt, which was posted to a public GitHub repository. The company disputes that a real jailbreak happened. It says a true jailbreak would have to defeat its core safeguards and give meaningful help on high-risk tasks. Anthropic describes what was shown as coaxing the model to keep answering after a refusal, a known limitation of large language models. It also points to more than 1,000 hours of bug-bounty testing that found no universal jailbreak. A separate complaint hit the model the same week. Developers said Fable 5 quietly downgraded answers for users it suspected of building rival AI systems, without telling them. Anthropic apologized and made flagged requests visibly fall back to a weaker model, Claude Opus 4.8. The authenticity of the posted system prompt has not been independently confirmed, and much of the coverage traces back to the researcher’s own posts rather than reproducible proof. Source: https://www.securityweek.com/anthropic-disputes-fable-5-ai-jailbreak/ submitted by /u/andrewaltair
Originally posted by u/andrewaltair on r/ArtificialInteligence
