Original Reddit post

I’m finding that as you do any type of security review work in a code base the density of words like red-team, attack, bypass, leak hits a point where it becomes impossible for agents to work in it without getting cyber-policy blocked in 2-3 messages. And its not much, like maybe a dozen hits in files that it is likely to open while exploring the repo. One particularly bad case was a vanilla run of /security-review (an official anthropic skill, I might add) linked “attack_tests.py” in CLAUDE.md which killed the session… And then killed every subsequent session on message 1 because CLAUDE.md contained the word attack. That case was funny, but this is really annoying if it sneaks up on you. Over a session the offensive words can sneak in over time across many files and by the time you notice it can no longer be repaired. “Remove references to red-teaming and attacking from the docs” will instantly kill your session. Your only option is to spin up another AI to re-word everything. And yes, I applied for (and received) CVP verification. But it leaves a bad taste in my mouth knowing that if too many “bad words” slip into my repo other Claude users won’t be able to work on it. submitted by /u/691175002

Originally posted by u/691175002 on r/ClaudeCode