Original Reddit post

I have been running Claude Code as my daily driver on one monorepo for about 3 months: 1,578 commits, roughly 1.1M lines of first-party code across ~4,200 source files. The work is not toy projects. It is production: immersive real-estate web apps (Next.js), a stack of 3D Gaussian-splatting reconstruction pipelines, Python/FastAPI ML services, and a fine-tuned domain LLM. Here is what actually moved the needle, and the stuff I wish I had known on day one.

  1. The model confidently invents features that do not exist. This is the one that cost me the most rework. I ran a multi-agent research pass to find new setup ideas, and it came back recommending Claude Code “features” that were partly or fully made up: a hook event that could redact secrets from the transcript (it is display-only), a per-session token budget (real, but API-only, not in the CLI), and a blocking mechanism that was simply the wrong one. So now I run an adversarial verification pass against the official docs before I wire anything. Treat the model’s confident claims about its own tooling as unverified until a doc says otherwise.
  2. Deterministic gates beat prose rules every time. A rule in your instructions file is a suggestion the model can talk past. A hook that exits non-zero is a wall it cannot. The single biggest reliability jump was moving enforcement out of prose and into hooks: block on detected secrets, warn on destructive commands, run typecheck/lint/build/tests as a real pre-ship gate. If a rule matters, make something fail when it is broken.
  3. Foundation before features, or everything looks templated. Before asking for a single component, feed the project: brand docs, design system, real examples of “good” for this specific brand. Generic input, generic output. The boring hours up front are why the result does not look like every other AI-built site.
  4. Specialist agents with a veto, not one generalist. I run a small set of narrow agents (creative direction, copy, SEO/schema, design, dev, responsive, performance, a final ship gate). The win is not “more bots.” It is that each one carries deep narrow context and can kill work that does not meet its bar, so I am not the bottleneck on every call.
  5. Offload the cheap work to a local model. Commit messages, changelogs, quick explains, summaries: route them to a local model via a hook. Claude does the thinking, the local box does the typing. Real token savings, no quality loss on the boring tasks.
  6. Cost is a feature you have to engineer. At high reasoning effort with parallel agents, spend gets real. Crop screenshots to the region under test instead of sending full-page captures (vision tokens scale with pixels, so this is often a 90%+ cut). Track tool-call burn. Cap runaway loops. Watch it, do not guess.
  7. Treat git and memory as part of the harness. Commit before any experiment so you can revert cleanly. Use isolated git worktrees so parallel agents do not clobber each other. Keep persistent facts in version-controlled files, not the model’s memory. Boring, and it is what lets you move fast without losing work. The throughline: stop trusting the model’s word (about features, about whether a rule held, about what it remembers) and start making the important things verifiable and enforced. That is the difference between a demo and something you can put in front of a paying client. Happy to go deeper on the hook setup, the verification pass, or the local-offload wiring in the comments. submitted by /u/pavloop

Originally posted by u/pavloop on r/ClaudeCode