Is "Model Collapse" just recursive regression to the mean?

www.reddit.com

Is "Model Collapse" just recursive regression to the mean?

www.reddit.com

eifachposteMB to AI (Reddit RSS)English · 16 days ago

Original Reddit post

I’ve been digging into the math behind Model Collapse (the “Ouroboros” effect where AI trains on AI data). It seems the core issue is Variance Reduction . Since LLMs are designed to output probable tokens, they naturally “smooth out” the distribution of human language. If you train a new model on that smoothed output, you lose the “tails” of the distribution—the creativity, edge cases, and nuance. It’s effectively a photocopy of a photocopy. I visualized how this “data degeneracy” loop works in a short breakdown here: https://youtu.be/kLf8_66R9Fs Discussion: Do you think we can statistically “re-inject” variance into synthetic data, or is the training corpus already permanently polluted? submitted by /u/firehmre

Originally posted by u/firehmre on r/ArtificialInteligence

You must log in or # to comment.

Chat