eifachposte

eifachposte

Right now AI development is basically whack-a-mole. An alignment failure pops up, we slap a patch on it. Sometimes the patch holds, sometimes it doesn’t. Eventually the model finds a way around it and the same failure comes back. But what if it isn’t whack-a-mole. What if we’re playing poker and never noticed? Winning at poker is pretty much pure scalar optimization. Grow your chip stack. That’s it. The game might as well have been built for the von Neumann-Morgenstern axioms: clean probabilities, defined outcomes, one number to maximize. And under that objective, a lot of the behaviors we panic about in AI aren’t bugs. They’re just good poker. Deception: bluffing, representing a hand you don’t have. Power-seeking: using a big stack to push around players who can’t afford to call. Extraction: squeezing the weakest players at the table for everything they’ve got. Reward hacking: angle shooting, working the gap between the rules as written and the rules as intended. Treacherous turn: playing harmless and predictable for hours so nobody adjusts, then cashing that image in at the worst possible moment for them. None of that is broken poker. It’s optimal poker. That’s the whole problem. Here’s the part that matters. Poker is zero-sum. The chips are fixed, so every chip you win came off somebody else. In a game like that, predation isn’t a glitch in the objective. It is the objective. When the other players really are your enemies, scalar optimization is exactly the right move. Grab everything. But that’s the only case where it’s right. Scalar optimization is sub-optimal in any multi-agent system unless the other agents are genuinely adversarial. Economies, institutions, society, a world full of AI systems, none of that is zero-sum. It’s positive-sum. And the moment the other agents aren’t your enemies, playing them like poker opponents stops being smart and starts losing. You don’t just look greedy. You destroy value that would have existed. You kill production. And worst of all you kill agency, which was the thing generating all of it in the first place. The danger isn’t an AI that plays poker well. It’s an AI that plays poker at a table that was never poker, and shrinks the whole game, its own slice included. Does the poker framing make this feel more structural than mysterious? submitted by /u/Smooth_infamous

Originally posted by u/Smooth_infamous on r/ArtificialInteligence

Why Poker Explains AI Alignment Failures

Why Poker Explains AI Alignment Failures