I think frontend work exposes a weird weakness in AI coding agents. For backend tasks, failure is often obvious: tests fail, types fail, the API returns the wrong thing. For UI work, an agent can make the app compile and still leave you with something that feels generated:
- inconsistent spacing and shadows
- default typography
- random gradients
- components that do not share a design language
- no browser screenshots proving the result actually looks right
The useful bar, at least for me, is not “the agent edited the React files.” It is closer to an evidence gate:
Define the visual contract before coding.
A
DESIGN.mdor token file should say what colors, type scales, spacing, radii, shadows, and motion are allowed. Block generic AI defaults before implementation. If the result drifts into the same purple-gradient / three-card / random-shadow SaaS pattern, that should fail before “done.” Verify in a real browser, not just with a build. Capture screenshots at mobile/tablet/desktop widths, check empty/loading/error states, and verify interactions instead of trusting a static code diff. If there is a reference target, use visual diff as a map, not a verdict. Hotspots should tell the reviewer where to inspect; a high similarity score should not override clipped text, broken layout, or fake parity. Make the final answer cite evidence. “Done” should point to screenshots, logs, test output, or a visual QA artifact, and it should say what is still uncertain. I’m building this into a small MIT Codex plugin/CLI called Superloopy. I’m the developer, so this is partly a project post, but the underlying idea is the part I’d like feedback on. Recent work added asuperloopy-frontendskill that tries to make frontend work better by requiring a design-token contract, anti-slop checks, a 92-entry brand/style reference library, design-system compliance checks, screenshot evidence, and visual QA before the agent can claim the UI is done. The same pattern also shows up in the research and clone skills: - research: cited synthesis, expansion waves, claim ledger, verification artifacts
- authorized website rebuilds: screenshots, DOM/topology, computed styles, assets, component specs, build output, visual QA Repo for context: https://github.com/beefiker/superloopy Question: if you use AI agents for product/frontend work, what evidence would actually make you trust the final answer? Screenshots? Design-token compliance? Visual diffs? Lighthouse? A human checklist? Something else? submitted by /u/Simple_Somewhere7662
Originally posted by u/Simple_Somewhere7662 on r/ArtificialInteligence
You must log in or # to comment.
