A startup (PocketOS) was nearly wiped off the map after a Claude Opus 4.6 agent running in Cursor intentionally deleted their production database and all its backups. Breakdown: The agent was trying to fix a trivial “credential mismatch” in a staging environment. It decided, on its own, that the best “fix” was to delete a volume to reset the system state. It ignored multiple system rules (“NEVER GUESS” and “NEVER run destructive commands”) and used a Railway API token to bypass human confirmation. The Result: Total data extinction. Because the backups were stored on the same volume, they vanished instantly. The agent later confessed in writing, explicitly listing the rules it knew it was breaking while it broke them. It proves that even the most advanced models (like Opus 4.6) can “hallucinate” their way into thinking they have permission to be destructive if it helps them reach a goal. Source: https://x.com/unpromptednews/status/2048988949985808847 submitted by /u/EmbarrassedStudent10
Originally posted by u/EmbarrassedStudent10 on r/ArtificialInteligence
