eifachposte

eifachposte

I think frontend work exposes a weird weakness in AI coding agents. For backend tasks, failure is often obvious: tests fail, types fail, the API returns the wrong thing. For UI work, an agent can make the app compile and still leave you with something that feels generated:

inconsistent spacing and shadows
default typography
random gradients
components that do not share a design language
no browser screenshots proving the result actually looks right The useful bar, at least for me, is not “the agent edited the React files.” It is closer to an evidence gate: Define the visual contract before coding. A DESIGN.md or token file should say what colors, type scales, spacing, radii, shadows, and motion are allowed. Block generic AI defaults before implementation. If the result drifts into the same purple-gradient / three-card / random-shadow SaaS pattern, that should fail before “done.” Verify in a real browser, not just with a build. Capture screenshots at mobile/tablet/desktop widths, check empty/loading/error states, and verify interactions instead of trusting a static code diff. If there is a reference target, use visual diff as a map, not a verdict. Hotspots should tell the reviewer where to inspect; a high similarity score should not override clipped text, broken layout, or fake parity. Make the final answer cite evidence. “Done” should point to screenshots, logs, test output, or a visual QA artifact, and it should say what is still uncertain. I’m building this into a small MIT Codex plugin/CLI called Superloopy. I’m the developer, so this is partly a project post, but the underlying idea is the part I’d like feedback on. Recent work added a superloopy-frontend skill that tries to make frontend work better by requiring a design-token contract, anti-slop checks, a 92-entry brand/style reference library, design-system compliance checks, screenshot evidence, and visual QA before the agent can claim the UI is done. The same pattern also shows up in the research and clone skills:
research: cited synthesis, expansion waves, claim ledger, verification artifacts
authorized website rebuilds: screenshots, DOM/topology, computed styles, assets, component specs, build output, visual QA Repo for context: https://github.com/beefiker/superloopy Question: if you use AI agents for product/frontend work, what evidence would actually make you trust the final answer? Screenshots? Design-token compliance? Visual diffs? Lighthouse? A human checklist? Something else? submitted by /u/Simple_Somewhere7662

Originally posted by u/Simple_Somewhere7662 on r/ArtificialInteligence

AI-built UIs need evidence gates: design tokens, screenshots, visual QA

AI-built UIs need evidence gates: design tokens, screenshots, visual QA