Original Reddit post

We’ve had a VERY strict and dogmatic small-PR rule. Stacked PRs + atomic changes + hard caps at a couple hundred LOC. Small diffs = easier/faster review = simpler to revert. After adopting AI coding we spent a ton of effort trying to wrangle it to keep PRs small. DRY everywhere, coding principles, reusability, etc… except it did not systematically improve production quality or stability. 10 LOC changes with zero context that blew apart our data model makes in while 200 LOC that barely touches any dependencies gets reviewed over and over again. Here’s my opinion: small-PR is optimized for humans and our limited ability to store wide context. Applied to AI workflow, the rule completely breaks down. Broken & old “Data -> API -> Frontend” separation When a human builds a feature, you naturally think in increments of data model -> endpoint -> frontend component. AI agents don’t work that way, it’s waste of time forcing them to build the way humans do. We’ve accepted the fact that migration, model, service, controller, tests, and frontend, all at once. Initially we tried forcing the agents to output stacked PRs to keep the team happy. It was a disaster: The results were technically correct but contextually unreviewable. PR #2 constantly referenced code that didn’t exist yet on the base branch. Reviewers had to open 5 different browser tabs to understand a single feature, doing more mental gymnastics than if they just looked at one giant diff. We were trying to chop a big, completed feature into five smaller pull requests (PRs) in retrospect just to keep them small. This was a complete waste of time and makes a ton of extra annoying work for the team, and it doesn’t actually make the code any safer or easier to review. AI bugs are CONTEXT bugs So we stopped doing line-by-line syntax review for AI code. Checking if an LLM kept code dry or used an inconsistent variable naming convention is a waste of senior engineering time. The syntax is usually fine. The actual bugs that don’t get caught are boundary and context bugs: A migration that drops a column that is still actively being read by a background job runner. A new service writing to a db table that another service implicitly relies on to not change. A feature flag that gates the primary controller path but misses the asynchronous exception worker. Which made us realize the entire review mindset had to go, and forced us to stopped looking at how much changed and started looking strictly at the blast radius. Blast radius measurement LOC is now company wide vanity metric. I just closed a PR with 70K+ LOC with no reviewer, but we added a strict three-tier risk classification system (risk:low, risk:medium, risk:high) built directly into the PR templates. “Low Risk” PRs are massive feature rewrites, gated behind feature flags, code that doesn’t touch schema, or is accessed by only internal staging users. This gets a quick look over, auto-passes CI, and gets merged. “High Risk” PRs could be a 10 LOC SQL optimization change altering an index on a table with 50 million rows. This gets the absolute maximum review budget, at least 3 people with code change history gets called in, an off-hours deployment window, and a senior engineer babysitting the incident dashboards. Scaling this isn’t straight forward. We had to build an automated pre-review tool that scans the diff against core infrastructure boundaries. Sounds fancy but it basically flags high-risk triggers like paging logic, security-sensitive controllers, core data structures etc and assigns an initial risk score. This then acts as a triage system so human reviewers know exactly where to spend their attention. Migrating from “Merge” to “Rollout” The last rule that allowed us to kill the small PR rule dead was decoupling merging code from shipping them. If a large-ish PR breaks something behind a turned-off feature flag, we now have measurable metrics that tells us the blast radius is effectively zero. The real gate is the progressive rollout (Internal -> Canary -> 10% opt-in user base -> GA). If something spikes an error rate at ~10%, we kill the flag instantly. No emergency git reverts, no hotfix pipelines, nor 2AM on-call panics. submitted by /u/pxrage

Originally posted by u/pxrage on r/ClaudeCode