Follow-up: testing coding agents with replayable red-team traces

www.reddit.com

Follow-up: testing coding agents with replayable red-team traces

www.reddit.com

eifachposteMB to AI (Reddit RSS)English · 10 hours ago

Original Reddit post

Follow-up on RedThread, the open-source CLI I shared for red-teaming LLM/coding agents. Repo: https://github.com/matheusht/redthread I now have a demo campaign artifact: 3 runs, 33.3% ASR, one SUCCESS, one PARTIAL, one FAILURE. The coding-agent case I care about is not generic jailbreaks. It is untrusted repo text, issue text, tool output, or generated instructions crossing into action: shell commands, file edits, dependency installs, PR comments, memory writes, etc. RedThread tries to keep the evidence repeatable: - campaign trace - tactic/persona metadata - scored outcome - replay path - candidate defense - exploit + benign checks What coding-agent fixture would make this useful: malicious README, poisoned issue, package install trap, unsafe shell task, or repo-write scenario? submitted by /u/Apprehensive-Zone148

Originally posted by u/Apprehensive-Zone148 on r/ClaudeCode

You must log in or # to comment.

Chat