I’m starting to think most “agent bugs” aren’t bugs. They’re mismatches between what we think we asked and what the agent thinks we asked. That got me thinking about how we frame agent observability. Most of the conversation treats the gap between what an agent claims it’s doing and what it actually does as a governance problem. Catch bad actions. Stop the agent before it deletes the wrong database. That’s real. But I’m seeing something else. A lot of developers are using the same idea for a completely different purpose: debugging their own assumptions about the model. Examples I keep hearing: Someone spent weeks debugging ranking issues, only to realize the prompt wasn’t being interpreted the way they thought. Output drift that wasn’t a bug. The agent was doing exactly what it believed it was asked to do. Instruction-following gaps where the agent technically followed instructions, just not in the way the operator expected. In all these cases, the developer wasn’t catching the agent. They were catching themselves. The most useful signal wasn’t the output. It was reconstructing: what did I think I asked vs what did the agent think I was asking? That makes me wonder if the “failure/incident” framing for observability is too narrow. “Intent vs execution” might not just be for governance. It might be one of the most useful debugging primitives for everyday agent work. Curious how others are handling this: Are you debugging prompt interpretation / output drift by reconstructing the agent’s understanding? What does that look like in practice? Logs, eval traces, reruns, something else? Does “claim vs action” resonate here, or does it feel like the wrong vocabulary outside governance? (For context, I’ve been exploring this space and built a small open-source tool around it. Happy to share if relevant, but mostly interested in whether this pattern resonates.) submitted by /u/rohynal
Originally posted by u/rohynal on r/ArtificialInteligence
