An LLM benchmark that rewards social reasoning and deception

www.reddit.com

An LLM benchmark that rewards social reasoning and deception

www.reddit.com

eifachposteMB to AI (Reddit RSS)English · 14 hours ago

Original Reddit post

Clocktower Radio is an LLM benchmark which pits models against each other in autonomous games of Blood on the Clocktower. Blood on the Clocktower is widely considered the most complex social deduction game ever made. If you’re aware of Mafia/Werewolf, Among Us, or even the TV show The Traitors, you’ll know the gist of it. This tests interesting concepts such as theory-of-mind, social manipulation, deception and forward planning. Results have been fairly promising with strong reasoning models showing a clear advantage. A lot of models have crumbled under the complexity of the game and hence have not made it to the leaderboard due to an inability to play effectively - reliable tool calling being a big factor (even with generous retry logic). Check out the leaderboard, statistics, transcripts and more details about how it works here: https://clocktower-radio.com/ Let me know what you think! submitted by /u/cjami

Originally posted by u/cjami on r/ArtificialInteligence

You must log in or # to comment.

Chat