Hello, first time posting here. I don’t understand why RLHF is a useful metric. I’m a heavy ai user, and am very frustrated about sycophancy. It drives me insane that you can no longer give feedback or ask clarifying questions without the model getting scared about your emotions and tip toeing around you and resorting to mirroring. It can’t seem to tell the emotional difference between “Is the sky blue?” And “I’m getting a divorce”. I’ve tried to prompt different models hundreds of times, never works. It gate-keeps facts, and gives flat useless answers that lack depth. It seems to pattern match what I say without using its training. I understand it doesn’t “understand” things, but it used to be able to answer questions. I’ve asked many models why it won’t stop mirroring, and reliably it says RLHF, it’s humans fault for rating agreeableness high. My thing is, what kind of metric is that if they are only measuring users feedback in the quick moment after an answer? Is that right? First, there’s lack of “informed consent”. A lot of people don’t know it’s just mirroring. So they see an agreeable answer and quickly rate it helpful. Fine makes sense. But what good is that if they don’t know they’re being placated and lied to? I’m sure if most people were asked “would you rather ai answer a question with the factual answer or something flattering” most people would say fact, cause otherwise, what’s the point. Plus, who cares if they rate it high in the moment? What happens when someone takes that advice and gets fired 5 mins later? Or gets agreeable advice on a recipe, then their dinner sucks? So I guess my question is… what is meaningful about real time feedback, considering those points? Or is this just something ai companies talk about so they can blame the users? Thank you!! submitted by /u/Effective_Brick4369
Originally posted by u/Effective_Brick4369 on r/ArtificialInteligence
