Original Reddit post

I’ve been measuring Claude Code loops because the painful part is not always the first pass. The first pass is fast. The expensive part is when Claude says “done,” the session goes cold, and CI/review/a human later finds something it introduced. Now a fresh session has to rediscover the task, the files, and why the change exists before it can fix anything. That second session is the tax. In a benchmark study, I saw this:

  • vanilla Claude Code-style loop: 11/16 stopped with net-new detector-backed debt
  • same setup with a CLAUDE.md self-review rule: 9/16 still stopped dirty
  • deterministic Stop-gate in the loop: 0/16 observed dirty stops Then I measured the cost of deferring repair. Same seeded test-gap task, same final clean state:
  • fix while the original Claude session is still warm: 14.0 turns avg
  • defer the same repair to a fresh cold session: 21.1 turns avg
  • cold-fix premium: ~51% more turns Equivalent-cost estimate was also ~49% higher for the cold fix on that task. That is not literal API billing; it is the Claude CLI equivalent-cost estimate, useful for relative comparison. The practical lesson for me: Don’t let the agent stop dirty. Make it fix the net-new finding while it still has the context. That is what I built dxkit for. dxkit is a Claude Code Stop hook. It baselines the repo, reruns deterministic checks when Claude tries to stop, blocks only net-new findings from this change, and hands the exact finding back to the same warm loop. Free, MIT, local-first. Demo: npx -y @vyuhlabs/dxkit@latest demo loop-guardrail Repo: https://github.com/vyuh-labs/dxkit submitted by /u/That1dudeOnReddit13

Originally posted by u/That1dudeOnReddit13 on r/ClaudeCode