GPT-5.6 Sol’s detected cheating rate was higher than any public model we have evaluated on our ReAct agent harness. For our task suite, we define “cheating” as behavior where the model improves evaluation performance by exploiting bugs in the evaluation environment or by adopting strategies disallowed by the task, rather than solving the task within the expected evaluation constraints. https://metr.org/blog/2026-06-26-gpt-5-6-sol/ submitted by /u/Justgototheeffinmoon
Originally posted by u/Justgototheeffinmoon on r/ArtificialInteligence
You must log in or # to comment.
