I’m testing a simple local QA workflow for Claude-based agents: run cases, compare baseline vs new, and get a single “gate” result for CI. The pain I hit is tool-call regressions: same prompt, different tool sequence, and the bug only shows up later in prod. How are you testing tool-call sequences today? Do you snapshot tool args? Or do you rely on trace UIs and manual checks? submitted by /u/Additional_Fan_2588
Originally posted by u/Additional_Fan_2588 on r/ClaudeCode
You must log in or # to comment.
