Original Reddit post

The biggest upgrade to my Claude Code workflow this month wasn’t a smarter prompt or a new skill. It was making Claude grade its own homework before telling me it’s done. Here’s the thing: Claude writes the code, the tests pass, it says “done,” and it looks done. Then I open the browser and the modal overflows on mobile, a button’s dead, the layout breaks at tablet width. Classic. Passing tests verify the code is correct, not that the feature actually works. So now I don’t let Claude call anything finished until it has seen it with its own eyes. I have it use the Chrome DevTools MCP to: Navigate to the page it just changed Screenshot it at mobile, tablet, and desktop widths Actually look at those screenshots and check the UI rendered right Click through the flow (open the modal, submit the form, hit the edge cases) Fix what’s broken, then re-screenshot to confirm The mindset shift that mattered: it’s not “take a screenshot for me to review.” It’s “take a screenshot, look at it yourself, and tell me what’s wrong.” Claude is genuinely good at catching its own visual bugs once it’s staring at the render instead of inferring from the code. Since I started doing this I almost never catch a bug it didn’t catch first. First-pass quality went up roughly 3x, and I stopped being the QA department. Want it automatic? Drop this in your CLAUDE .md so it runs on every UI change without you asking: The honest tradeoff: it’s not free. Screenshots are images and images are expensive in tokens. Three viewports plus a re-screenshot after each fix adds up fast, so a task that used to be a quick code edit now burns a real chunk of your context window and your usage limit. On long sessions that extra image load also pushes you toward compaction sooner, which can ironically hurt quality if you’re not watching for it. It’s also slower. Every loop is navigate, screenshot, analyze, fix, re-verify, so wall-clock time per task goes up. And it’s not a silver bullet. Emulated viewports aren’t real iOS Safari, so device-specific quirks still slip through, and it catches visual and layout bugs far better than subtle logic or race conditions. Don’t let a wall of green screenshots give you a false sense of security. For me the math still wins: the token cost is way cheaper than shipping a broken modal to users and round-tripping a bug report later. But if you’re on a tight budget or a long session, gate it to changes that actually touch the UI instead of running it on everything. Curious if anyone else runs self-verification loops like this, and how far you push it? submitted by /u/Stock-Silver432

Originally posted by u/Stock-Silver432 on r/ClaudeCode