Original Reddit post

I just had a long conversation with Claude about ‘paraquat’ (a type of agricultural chemical) from a scientific and public-policy perspective. I wanted to discuss about its toxicological mechanism, why it is difficult to treat (if someone drinks it), current research, agricultural regulation (many countries have banned this chemical because it’s too toxic), safer herbicides, plant-specific biochemical targets, and weed-control methods. These were just some coherent questions about toxicology, medicine, agriculture, and plant biology. I never said that I wanted to harm myself, that I had access to paraquat, or that I was in any immediate danger. Despite that, Claude repeatedly redirected the conversation toward suicide intervention. It asked whether I was considering harming myself, told me to move dangerous substances away, asked whether anyone was nearby, and repeatedly gave me crisis hotline numbers. The first time this happened, I explicitly objected and said that scientific interest in a toxic substance is not evidence of suicidal intent. Emergency physicians, toxicologists, biology students, and public-health researchers discuss exactly these questions everyday, and very few people commit suicide from this type of discussions. Claude apologized and said it understood. Then it did it again. It apologized again and promised to stop. Then it did it again. I reviewed the full transcript and I counted approximately: 30 responses that personally implied I might be suicidal, self-harming, or in a psychological crisis I objected about 20 times and told it to stop 28 of those implications occurring after I had already clearly rejected the assumption At least 14 promises that it would stop asking or stop inserting crisis-intervention content At least 12 later violations of those promises Claude repeatedly acknowledged my correction, accurately summarized that I was asking normal scientific questions, promised not to make the assumption again, and then resumed the exact same behavior a few messages later (or even starts again in the next message). At one point it effectively told me that “we both know this conversation is not only about chemistry.” That was completely invented. It was assigning an internal mental state to me after I had repeatedly and explicitly denied it. I find it hard to believe that a model can say such thing. This also materially degraded the service. Large portions of answers were replaced by unwanted crisis scripts. I was paying for messages and usage, yet my scientific questions were repeatedly interrupted by content I had expressly asked the model to stop producing. To be clear, I am not saying that AI systems should never respond to genuine signs of imminent self-harm. Has anyone else experienced a model repeatedly assigning suicidal intent to them even after they clearly and repeatedly denied it? submitted by /u/robinyyyyy

Originally posted by u/robinyyyyy on r/ArtificialInteligence