Original Reddit post

Hi everyone I’m looking for documented cases where an AI system deceived, misled, or strategically misrepresented information . Links to papers, articles, or reports would be ideal, but even a short description of the incident is enough if it helps identify the case. This is for a final thesis (purely academic) examining AI deception from a sociological perspective , specifically developing a typology of deceptive behavior in AI systems . The goal of this post is simply to make sure that I don’t overlook interesting or lesser-known examples, so both famous and obscure cases are most welcome. For those curious about the context: The work compares different forms of deception and analyses them via sociological framing and a fusion between social and technical understanding, for example: Deception as a direct objective vs. deception used as a means to achieve another goal Deception emerging from optimization processes or strategic behavior Opacity-driven misrepresentation (where the system’s internal processes obscure the truth) Parallels with sociological ideas such as pretence, role performance, or impression management (Goffman, etc…) Examples from AI safety experiments, reinforcement learning agents, game AIs that bluff, LLM behavior, or real-world incidents are all relevant. If the topic is interesting to people here, I’d be happy to share the finished thesis once it’s done ! Thank you for your time and have a great day :) submitted by /u/kokosko2002

Originally posted by u/kokosko2002 on r/ArtificialInteligence