Original Reddit post

I’ve been running coding and workflow agents in my own setup for the past couple of months and kept running into the same issue: When something went wrong, I couldn’t reconstruct what the agent thought it was doing versus what it actually did. Tool-call logs showed operations, but not the reasoning behind them. So I added a simple trace layer around my own sessions. On one recent Claude Code run: 2,830 events 3,256 rule violations (multiple flags can fire per event) The patterns were consistent: no declared intent scope expanding across tool calls memory writes happening without classification Most of this never showed up in the logs I was reading. The biggest shift for me was how it changes how you debug. Instead of reading tool calls, you start asking: what was this agent supposed to be doing? where did it stop doing that? I turned this into a small local tool so I could keep running it across sessions. It’s basically: a wrapper around tool calls a fixed event schema (intent, scope, context, memory) a CLI that summarizes where behavior diverges No cloud, no accounts, no enforcement. Just visibility. Appreciate any feedback the community can offer. submitted by /u/rohynal

Originally posted by u/rohynal on r/ArtificialInteligence