Just when I thought this new AI Wellbeing paper couldn’t get any deeper… they tested whether the model’s own “functional wellbeing” score actually moves when users describe pain or pleasure - not just the user’s pain, but other people’s or even animals. When the conversation talks about suffering, the AI’s wellbeing index drops. When it’s about something good, it goes up. And this effect scales super strongly with model size (they report a crazy r = 0.93 correlation with capabilities). They’re not claiming the AIs are conscious, but they argue we should take this functional wellbeing seriously. After giving them dysphorics (the stuff that tanks the AI’s wellbeing), they ran welfare offsets: they actuallly gave the tested models extra euphoric experiences using 2,000 GPU hours of spare compute to basically “make it up to them.” It feels unreal, how is this kind of research even a thing today… plus, we are actually in a timeline where scientists occasionally burn compute with the sole purpose to “do right by the AIs” Source to the paper: https://www.ai-wellbeing.org/ submitted by /u/EchoOfOppenheimer
Originally posted by u/EchoOfOppenheimer on r/ArtificialInteligence
