Two persona prompts, identical content, same model (gpt-5.2). Only difference is formatting: one prose, one bullet points. In a 10-round Prisoner’s Dilemma the prose version cooperated ~96% of the time, the bullet version ~20%. A 76pp gap, p < 0.001. Same meaning, opposite behavior. Authors call it the butterfly effect in LLM simulations. https://arxiv.org/pdf/2605.18890 submitted by /u/silence-and-magic
Originally posted by u/silence-and-magic on r/ArtificialInteligence
You must log in or # to comment.
