Original Reddit post

This keeps happening to me and I never see anyone talk about it. I’ll have an AI coding assistant working exactly the way I want. System prompt tuned, outputs consistent, the whole setup running smoothly for weeks. Then the provider ships a new model version, I update because it’s supposed to be better, and suddenly 30% of my prompts produce different outputs. Not broken. Not wrong. Just different. The problem is ‘different’ in an AI context means every downstream step that depended on the old behavior now has to be retested. A prompt that used to return structured JSON starts returning markdown with the same data inside. A summarization step that used to be 3 sentences becomes 5. Small changes, but they ripple. My current workaround: I pin model versions in production and only upgrade in a test branch with a regression suite against known outputs. Not a perfect solution. Regression suites are expensive to maintain and never comprehensive. But it cuts surprise failures significantly. Would genuinely like to know how others handle this. Most of the tooling I’ve seen treats models as interchangeable but in practice they’re not. submitted by /u/Acrobatic_Task_6573

Originally posted by u/Acrobatic_Task_6573 on r/ClaudeCode