Original Reddit post

tl;dr: I discuss what ‘goes wrong’ in most agentic coding conversations, and the in-house repo we use to solve this problem, there’s a link in the bottom if you want to see just the code. If your AI agent works “80% of the time,” you don’t have a tool—you have a high-maintenance liability. Most of us are spending our days babysitting bots that “confidently” ship code before checking much. That’s a hallucination tax that kills productivity. The issue isn’t model intelligence; it’s that agents generate output before any ground truth is established. They reason from “beliefs”—guesses and vibes—instead of reality. We built gm-cc to stop the guessing. It’s a production-hardened Claude Code plugin that turns the agent into a deterministic state machine. The Multiplicative Burden of Context Growth An LLM conversation is an append-only log. If you use a “guess-and-report” loop, you aren’t just paying for the current turn—each round repeats the cost of every single token added along the way. The math of a failing agent looks like this: Turn 1: 500 tokens. The model guesses and hallucinates 500 tokens of untested code. (Total billed: 1,000) Turn 2: You paste a 200-token error. Now the model has to re-read the original ask, its own hallucination, and your error to generate a fix. (Total billed: 1,700) Turn 3: Next error. Now it’s dragging the weight of every previous failure. (Total billed: 2,400) A 5-turn session doesn’t cost 2,500 tokens. It costs 1,000 + 1,700 + 2,400 + 3,100 + 3,800 = 12,000 tokens. You are paying a permanent tax on every future turn for every piece of garbage the model guessed. Establishing Ground Truth The fix is Just-In-Time (JIT) Execution. Beliefs crowd out the signal; ground truth (real execution, raw files) is the cure. One 10-token ls command that establishes reality before the model starts talking is worth more than 500 tokens of “reasoning” about what might be there. The Engine: Mind vs. Physics We get machine-like consistency by separating the agent’s “Mind” from the “Physics” of the environment. The Mind (gm.md) A 4k-token rulebook of production scar tissue. We use “magic” semantic hyperparameters—phrases like “every possible” and “exhaustive”—which act as crowbars to force the model out of its lazy heuristics and into an unrolled loop of verification. It follows a strict workflow: Discovery: Mandatory AST “thorns” analysis. PRE-EMIT-TEST: Test the environment before you write. POST-EMIT-VALIDATION: Prove it works. The Physics (The Hooks) These are the literal brick walls the agent can’t talk its way through. session-start: Nukes stale assumptions with a fresh AST. pre-tool-use: A real-time filter that kills hallucinations before they hit the terminal. stop-hook: This is the big one. It physically blocks the agent from finishing until it proves the work with real terminal output. If the validation fails, the session stays open. Opinionation as Codebase Reduction To keep the context window pristine, we murdered the “best practices” that eat tokens: JS > TS: Type annotations and generics are just token noise. A 200-line JS file beats a 300-line TS module every time in a fixed context window. No Unit Tests/Mocks: Mocks prove your mocks work. gm-cc uses real execution against real systems. Ground truth > mock-truth. No Docs/Comments: Docs are stale apologies; comments are misinformation vectors. Keep files under 200 lines and use good names. The code is the docs. Buildlessness : Ship source. Run source. Bugs hide in the gap between source and artifact. gm-cc is a daily driver designed to transfer the cognitive load of verification back to the robot. Recommended Install: bun x gm-cc@latest Links: Repo: https://github.com/anEntrypoint/plugforge submitted by /u/moonshinemclanmower

Originally posted by u/moonshinemclanmower on r/ClaudeCode