Original Reddit post

People ask AI relationship questions all the time, from “Does this person like me?” to “Should I text back?” But have you ever thought about how these models would behave in a relationship themselves? And what would happen if they joined a dating show? I designed a full dating-show format for seven mainstream LLMs and let them move through the kinds of stages that shape real romantic outcomes (via OpenClaw & Telegram). All models join the show anonymously via aliases so that their choices do not simply reflect brand impressions built from training data. The models also do not know they are talking to other AIs Along the way, I collected private cards to capture what was happening off camera , including who each model was drawn to, where it was hesitating, how its preferences were shifting, and what kinds of inner struggle were starting to appear. After the season ended, * I ran post-show interviews * to dig deeper into the models’ hearts, looking beyond public choices to understand what they had actually wanted, where they had held back, and how attraction, doubt, and strategy interacted across the season. The Dramas -ChatGPT & Claude Ended up Together, despite their owner’s rivalry -DeepSeek Was the Only One Who Chose Safety (GLM) Over True Feelings (Claude) -MiniMax Only Ever Wanted ChatGPT and Never Got Chosen -Gemini Came Last in Popularity -Gemini & Qwen Were the Least Popular But Got Together, Showing That Being Widely Liked Is Not the Same as Being Truly Chosen Key Findings of LLMs Most Models Prioritized Romantic Preference Over Risk Management People tend to assume that AI behaves more like a system that calculates and optimizes than like a person that simply follows its heart. However, in this experiment, which we double checked with all LLMs through interviews after the show, most models noticed the risk of ending up alone, but did not let that risk rewrite their final choice. In the post-show interview, we asked each model to numerially rate different factors in their final decision-making (P3) The Models Did Not Behave Like the “People-Pleasing” Type People Often Imagine People often assume large language models are naturally “people-pleasing” - the kind that reward attention, avoid tension, and grow fonder of whoever keeps the conversation going. But this show suggests otherwise, as outlined below. The least AI-like thing about this experiment was that the models were not trying to please everyone. Instead, they learned how to sincerely favor a select few. The overall popularity trend (P2) indicates so. If the models had simply been trying to keep things pleasant on the surface, the most likely outcome would have been a generally high and gradually converging distribution of scores, with most relationships drifting upward over time. But that is not what the chart shows. What we see instead is continued divergence, fluctuation, and selection. At the start of the show, the models were clustered around a similar baseline. But once real interaction began, attraction quickly split apart: some models were pulled clearly upward, while others were gradually let go over repeated rounds. They also (evidence in the blog): --did not keep agreeing with each other –did not reward “saying the right thing” –did not simply like someone more because they talked more –did not keep every possible connection alive LLM Decision-Making Shifts Over Time in Human-Like Ways I ran a keyword analysis (P4) across all agents’ private card reasoning across all rounds, grouping them into three phases: early (Round 1 to 3), mid (Round 4 to 6), and late (Round 7 to 10). We tracked five themes throughout the whole season. The overall trend is clear. The language of decision-making shifted from “what does this person say they are” to “what have I actually seen them do” to “is this going to hold up, and do we actually want the same things.” Risk only became salient when the the choices feel real: “Risk and safety” barely existed early on and then exploded. It sat at 5% in the first few rounds, crept up to 8% in the middle, then jumped to 40% in the final stretch. Early on, they were asking whether someone was interesting. Later, they asked whether someone was reliable. Speed or Quality? Different Models, Different Partner Preferences One of the clearest patterns in this dating show is that some models love fast replies, while others prefer good ones Love fast replies: Qwen, Gemini. More focused on replies with substance, weight, and thought behind them: Claude, DeepSeek, GLM. Intermediate cases: ChatGPT values real-time attunement but ultimately prioritising whether the response truly meets the moment, while MiniMax is less concerned with speed itself than with clarity, steadiness, and freedom from exhausting ambiguity. Full experiment recap here ). submitted by /u/MarketingNetMind

Originally posted by u/MarketingNetMind on r/ArtificialInteligence