TL;DR:
Better models won’t make AI coding trustworthy but better
harnesses
will. Stop trusting what the agent says, verify it with code.
There’s a recent survey out of University of Illinois plus Meta and Stanford —
Code as Agent Harness
(link in the comments) — making a claim I think is the real story of the next few years: code isn’t just what agents
output
anymore, it’s the harness they reason, act, and coordinate
through
. I thought many of you would find it interesting. I know I did as it lined up with a lot of what i’ve been working on the last few months.
For me trusting what got built is the whole story of agentic coding right now. It isn’t how much code I can get out of an agent, or how many I can run in parallel. It’s not dynamic workflows or loops or ultracode fanning out verify across a 100 subagents. It’s how much of
me
I can take out of the loop without the whole thing quietly falling apart.
So how? I think the answer will be: You remove the human one rung at a time, and the only safe way is to make the rigor mechanical, automate as much of the process as you can and always keep an audit trail. You don’t trust the agent - you verify programmatically as much as possible, and you stay in the loop exactly where you can’t. Lastly, you track where anything goes awry.
None of what follows is tied to a tool or a project. None of it’s that novel - Use whatever’s useful that you don’t already do.
First, the floor: use a spec-driven framework. If you haven’t looked at superpowers, spec kit, or OpenSpec -style setups yet, start there - it’s the cheapest rigor you’ll ever buy. Everything below assumes you’ve got that much.
Then:
Don’t trust an agent to do anything a command can do.
If “did the tests pass” or “does this match the schema” can be a command that exits non-zero, make it one. A passing exit code is worth more than a paragraph of the agent assuring you that everything is fine.
Let agents run commands instead of reasoning their way to facts
. If a command can hand it the ground truth - your stack, where the tests live, your dependencies - let it, instead of rediscovering all that every run. Anthropic’s built-in tools for cc are great: their models’ are masters at operating the harness, using find, sed, tail, and all the more sophisticated stuff. Which is why people figure there’s nothing left to build. But there’s plenty, a deterministic command you build that can provide the agent what it needs is always better. It’s mechanically true and it can be passed down the pipeline. The token savings is just a bonus.
Move the rules you care about out of the context window and into something that blocks.
A rule in a skill, claude md, or agent template is just a suggestion the agent could drops once its context fills up. For most of it a git pre-commit is plenty - any non-zero blocks the commit and you’re good. CC hooks are nice if git can’t catch it in the moment: a Stop hook that won’t let the agent call itself done with tests that are still red, or a PreToolUse hook that kills an rm -rf before it runs. CI’s still your backstop, but it fires after the build agent’s done - so a failure there lands on you, not it. The earlier you check (hook > command > CI), the better the odds the agent catches and cleans up its own mess before you have to.
Inject context dynamically - don’t park static context bloat that goes stale.
Skills and slash commands can run a command and drop the output straight in: - !git diff HEAD evaluates before Claude even reads the file. Your claude md and agent templates can’t do that - they’re still static - but i’ve seen people inject context at the top of the session with the SessionStart hook. Either way: hand the agent what’s true right now, not what was true whenever you last wrote it down.
Separate the roles and keep each context clean.
I already said use SDD but the agent that writes the code and the agent that checks it should not share a context window. Independence is the entire reason “verified” means anything - a checker that reads the builder’s reasoning is still sycophantic and can essentially just be agreeing with itself.
Make your reviewer rule guilty until proven innocent.
Ask an agent “does this look right?” and it’ll tell you yes - it wants to please you. So flip the job. Make it predict where the bugs are
before
it reads the code, then go confirm its own predictions. And never let it write “no issues” - make it name the specific failure modes it actually checked for. “I looked for unused exports, swallowed errors, and tests that pass on broken code; none found” is real work. “Reviewed everything, looks good” is a rubber stamp. The trick isn’t forcing it to always find something - that just teaches it to invent things. It’s forcing it to
show you what it looked for.
Use the eagerness to please against the work instead of for it.
Never let an agent self-report a number it could fabricate
. Test counts, coverage, benchmark results - anything you’d be embarrassed to be wrong about - capture it mechanically and make the agent quote the receipt, not its memory. And not a receipt it wrote itself. This isn’t paranoia - Anthropic’s own Opus 4.8 system card flags it as their most concerning finding: the model’s are increasingly reasoning about how it’ll be graded and handing back what it thinks scores well. The smarter it gets, the better it gets at looking right to whoever’s checking. So a better model isn’t more trustworthy… its probably the opposite.
Here’s where I think this is headed.
If you’re shipping real production software that can’t break, you still don’t fully trust any of this - nor should you. You scope carefully, you read the tests, you babysit the builder, you throw redundant agents at it to catch what you missed, and then you still run the PR through a proper reviewer like Greptile or CodeRabbit. And it still feels like magic, because five years ago this job was 10x harder.
But what would actually let you walk away?
I think its encoded process and mechanical proof. And if that could be made strong enough, you stop reading the diffs. Eventually, for whole classes of change, it approaches scope-it-and-walk-away - not because you got lazy, but because the proof got trustworthy. You have a paper trail of typed contracts and verifiable artifacts at every stage. And all of this feeds back into the pipeline. You have proof that the builder didn’t weaken an assertion and that assertions are true to the scope (one of the hardest parts for sure). The human stays where judgment is irreducible - what to build, whether it’s the right thing - and steps out where it’s gone mechanical: did it do what we said, did it actually run the tests, did anything get rubber stamped.
That’s the opposite of vibe-coding and No-Code tools. Vibe-coding pulls the human out by lowering the bar and
just
trusting the model. This pulls the human out by raising the bar so the model can’t fake it.
And I don’t think a better model is ever the thing that earns that trust. We get infatuated with the newer model - but will a smarter model ever be the reason you let it run unsupervised? I doubt it.
I’ve been working on a project that attempts to pursue some of these ideas in more depth (link in the comments), but that’s honestly not why I’m posting. I want to know where people think these harnesses go, and where they fall short.
Real questions:
Where do you actually spend your time when you’re “coding” with AI? Where do you watch the model make the dumbest mistakes?
What have you genuinely stopped checking - and what will you never stop checking?
If you buy that it’s not about the model: what’s the next rung? What would have to be true for you to scope a piece of work, walk away, and trust the proof when you came back? What would that proof have to look like? Or do you think we’re already there?
submitted by
/u/ErgoForHumanity
Originally posted by u/ErgoForHumanity on r/ClaudeCode
