I’ve been using AI from the day OpenAI released ChatGPT 3. As a coder, it’s been my lifeline and bread and butter for years now. I’ve watched it go from kinda shitty but still working code, to production grade quality by Opus 4.6. But aside from code, one other major pursuit of mine is board games. And I was wondering how good these LLM AI’s are at playing these boardgames. Traditionally this was an important benchmark for AI quality - consider Google’s long history in that domain, especially Alpha Go. So I asked myself, could these genius models like Opus 4.6 play these games I like to play, at an actual high level? And another super interesting area to explore - these bots, while cognitively highly skilled, could they handle themselves socially? Boardgaming is often as much a social skill as it is a cognitive skill. I decided to start with a relatively simple game to implement, from a technological standpoint - the classic game of Risk. Having played this game extensively as a kid, I was especially curious to see how LLM’s would fare. Plus a little fun nostalgia :) So I built https://llmbattler.com/
- an AI LLM benchmarking arena where the frontier models play board games against one another. Started with Risk, but definitely plan on adding more games ASAP (would love to hear ideas on which games). We’re running live games 24-7 now, with random bots, and one premium game daily featuring the frontier models. Would be awesome if you’d take a look and leave some feedback. I added ELO leaderboard and am developing comprehensive benchmarking metrics. Would love any thoughts or ideas. Also wondering if there was interest in the community to play against or with LLM’s, something that piques my interest, personally, and would add it for sure given sufficient interest. submitted by /u/naftalibp
Originally posted by u/naftalibp on r/ArtificialInteligence
