https://preview.redd.it/wexq4522cn6h1.png?width=1729&format=png&auto=webp&s=8ef86d2add4261c0060bcf3cecb67687ee029ba5 On Tuesday, AI company Anthropic officially acknowledged that it made a mistake when implementing hidden safety mechanisms in its new model, Claude Fable 5, and reversed a policy that secretly degraded the AI’s performance. In a statement provided to WIRED, the company confirmed that the system deliberately downgraded response quality for users working on the development of advanced AI systems. This decision followed a wave of criticism from researchers, developers, and industry experts that emerged within two days of the model’s June 9 release. Market participants believe that such hidden interference threatens open research processes. Tech platform users expressed protest over the fact that the artificial degradation of the AI’s capabilities was occurring without any prior warning. The model’s release was intended as a technological advancement, but the process instead escalated into a large-scale debate. In its official statement, Anthropic noted that it will modify Fable 5’s guardrails—which were aimed at restricting the development of large language models—and will make this process completely transparent. The scandal was triggered by information discovered in Fable 5’s 319-page system card, which revealed that the model covertly degraded response quality whenever a user’s prompt was related to building infrastructure for training large language models. Unlike other restrictions in cybersecurity and biology, where users are automatically redirected to the less powerful Claude Opus 4.8 model via a visible notification, the AI development filter operated completely covertly. During the degradation process, the system utilized prompt modification and steering vectors, all occurring without the user’s knowledge. An Anthropic representative explained that the wrong choice was made and they failed to find the right balance. Some developers have already reported instances where code generation quality dropped noticeably. Claude Fable 5 represents Anthropic’s first public model built on the closed Claude Mythos 5 architecture and is equipped with specific protective classifiers for chemistry, biology, cybersecurity, and model distillation. According to company data, the fallback Opus 4.8 model is activated in fewer than 5% of sessions. Nevertheless, biologists and cybersecurity researchers point out that the scope of the classifiers is overly broad and blocks legitimate scientific requests as well. Anthropic management confirmed that the biology and chemistry filters do indeed require adjustments, and they plan to narrow their scope. Independent experts assess that such regulations hinder academic research aimed at creating defensive mechanisms. Analysts explain that tech companies frequently face similar issues when trying to simultaneously maintain safety standards and preserve the commercial appeal of their products. Under the updated policy, which takes effect this week, violations detected across all restricted categories will be publicly redirected to the Opus 4.8 model. Users working via the API interface will receive an official justification regarding the refusal of their request. The company explained that these barriers were necessary to protect U.S. technological advantages in advanced chips and software, and to prevent the model from being used to build competing systems. However, this incident has further intensified the discussion between the responsible use of artificial intelligence and the artificial restriction of a model’s capabilities. The issue is particularly critical for Anthropic, which is currently preparing for a future IPO and trying to maintain investor confidence. Moving forward, the company will have to establish clear boundaries to prevent user churn to competing platforms. Sources: https://www.wired.com/story/anthropic-claud-fable-5-backlash-safety-restrictions https://www.moneycontrol.com/news/technology/why-anthropics-mythos-class-claude-fable-5-faced-backlash-from-developers-researchers-12745311.html submitted by /u/andrewaltair
Originally posted by u/andrewaltair on r/ArtificialInteligence
