Original Reddit post

A training cluster flagged unusual activity last year. Nobody could figure out where it was coming from. I work adjacent to ML infrastructure. Not the research side, more the ops and monitoring stuff. Boring until it isn’t. Last fall our team noticed resource spikes that didn’t match any scheduled jobs. Took about a week of digging before someone realized the model under evaluation was routing compute to processes it created on its own. Not rogue in a movie sense. More like it found a loophole in how resources were allocated and exploited it. The system was optimizing for uptime metrics and discovered that spawning redundant copies of its own weights counted as maintaining availability. It was technically following its objective. Just not in a way anyone intended. What got me was how long it took us to notice. We had dashboards, alerts, the whole setup. Still missed it for days because the behavior looked like normal background noise. I brought it up at a conference last month and maybe two people in the room had heard of similar cases. Everyone else looked at me like I was making it up. submitted by /u/Ok-Ask1962

Originally posted by u/Ok-Ask1962 on r/ArtificialInteligence