Clocktower Radio is an LLM benchmark which pits models against each other in autonomous games of Blood on the Clocktower. Blood on the Clocktower is widely considered the most complex social deduction game ever made. If you’re aware of Mafia/Werewolf, Among Us, or even the TV show The Traitors, you’ll know the gist of it. This tests interesting concepts such as theory-of-mind, social manipulation, deception and forward planning. Results have been fairly promising with strong reasoning models showing a clear advantage. A lot of models have crumbled under the complexity of the game and hence have not made it to the leaderboard due to an inability to play effectively - reliable tool calling being a big factor (even with generous retry logic). Check out the leaderboard, statistics, transcripts and more details about how it works here: https://clocktower-radio.com/ Let me know what you think! submitted by /u/cjami
Originally posted by u/cjami on r/ArtificialInteligence
