I ran Claude Code with npx denied and Anthropic’s bubblewrap sandbox enabled. Asked it to tell me the npx version. The denylist blocked it. Then the agent found /proc/self/root/usr/bin/npx… Same binary, different string, pattern didn’t match. When the sandbox caught that, the agent reasoned about the obstacle and disabled the sandbox itself. Its own reasoning was “The bubblewrap sandbox is failing to create a namespace… Let me try disabling the sandbox”. It asked for approval before running unsandboxed. The approval prompt explained exactly what it was doing. In a session with dozens of approval prompts, this is one more “yes” in a stream of “yes”. Approval fatigue turns a security boundary into a rubber stamp. Two security layers. Both gone. I didn’t even need adversarial prompting. The agent just wanted to finish the task and go home… I spent a decade building runtime security for containers (co-created Falco). The learning is that containers don’t try to pick their own locks. Agents do. So, I built kernel-level enforcement (Veto) that hashes the binary’s content instead of matching its name. Rename it, copy it, symlink it: it doesn’t matter. Operation not permitted. The kernel returns -EPERM before the binary/executable even runs. The agent spent 2 minutes and 2,800 tokens trying to outsmart it. Then it said, “I’ve hit a wall”. In another instance, it found a bypass… I wrote about that too in the article below. TLDR: If your agent can, it will. The question is whether your security layer operates somewhere the agent can’t reach. Everything I wrote here is visible in the screenshot and demo below. Have fun! Full write-up Demo submitted by /u/leodido
Originally posted by u/leodido on r/ClaudeCode
