If you fine-tune an LLM on one topic, then fine-tune it on another, it forgets the first one. This is called catastrophic forgetting and it’s one of the biggest unsolved problems in production AI. I’ve been working on this for a while and wanted to share actual benchmark numbers. The test: Train Mistral-7B sequentially on 5 domains — medical, legal, financial, code, science — and measure how much each domain degrades after all 5 are done. Results (3-seed average): Method | Avg Drift Standard LoRA | +43.0% Frozen (no learning) | +1.95% Constrained adapter | -0.16% Positive = the model forgot. Negative = it actually got slightly better (positive transfer). Per domain: Domain | Constrained | LoRA Medical | -0.09% | +128% Legal | -0.17% | +37% Financial | -0.13% | +19% Code | -0.14% | + 15% Science | +0.01% | -0.05% The constrained adapter limits how gradients update during each new domain so older knowledge isn’t overwritten. It’s not freezing the model — a frozen adapter drifts +1.95%, while this actually shows slight improvement on prior domains. This matters because right now, most companies either: - Retrain from scratch every time (expensive) - Run a separate model per domain (unmanageable) - Accept that fine-tuning breaks things (risky) Curious what approaches others have seen for handling this in practice. Most of the CL literature is still academic — real-world production deployments are rare. submitted by /u/fourwheels2512
Originally posted by u/fourwheels2512 on r/ArtificialInteligence
