eifachposte

eifachposte

Steinberger posted over the weekend about how he doesn’t write code anymore, just designs agent loops. Boris Cherny from Anthropic said basically the same thing. He doesn’t prompt Claude, just creates loops and they handle the rest. If you’re at Anthropic and tokens are essentially free, sure, let it loop all day. Most of us are paying real money for every file the agent reads. Full disclosure I run a software delivery company and we do a lot of brownfield work, so this is what I’m seeing from that side. We set up agent loops on a client’s core product last quarter. The agents were fast. Four features shipped in a week. PRs looked clean, CI passed, the team was excited about it. Then security review caught it. All four features had used a pattern the team had been trying to get rid of for two years. The old pattern was in something like 40+ files across the codebase. The new one existed in maybe 6. The agent looked at what was most common and followed it. I mean, why wouldn’t it. It doesn’t know your team has a migration plan. It doesn’t read your architecture decision records. It reads your code. And your code told it the deprecated way was the right way because that’s what most of the codebase looked like. Nobody caught it in code review either because every PR was functional. The code worked… It was just wrong in a way you’d only notice if you knew the team was actively moving away from that pattern. On a greenfield project the agent only has your prompt and system instructions to go on. You control the context. On brownfield the codebase is the context and it drowns out whatever you put in your prompt. 40 files beat one paragraph of instructions every single time. Everyone throws around the “88% of agent projects fail before production” stat. I think there’s a worse number that nobody is tracking. How many reach production and succeed by every visible metric while putting back the same tech debt the team was trying to pay down. Because that’s what I keep seeing. Features ship, velocity looks great in the sprint review, and the whole time the codebase is getting worse underneath. I write about what we’re seeing across 100+ engineering engagements in a weekly breakdown, click here if you want to read more on this topic. Anyway I’m not saying don’t use loops. I’m saying before you point one at an existing codebase, figure out what’s in there that you wouldn’t want it to learn from. Because it will learn from all of it. It doesn’t have opinions about which is which. submitted by /u/Senior_tasteey

Originally posted by u/Senior_tasteey on r/ArtificialInteligence

Agent loops are great until they learn from your worst code

Agent loops are great until they learn from your worst code