Sometimes it’s hard to know which AI agent will actually give the best result. Claude Code might solve a problem perfectly once and fail the next time. Codex sometimes writes cleaner code. Gemini occasionally comes up with completely different approaches. So I built an “AI Arena” mode for an open-source tool I’m working on. Instead of running one agent, it runs several in parallel and lets them compete on the same task. Workflow write the prompt once run Claude Code, Codex, Gemini CLI at the same time each in its own git worktree compare results side-by-side pick the best solution What surprised me most: the solutions are often completely different . Seeing them next to each other makes it much easier to choose the best approach instead of retrying prompts over and over. Under the hood parallel CLI agent sessions automatic git worktree isolation side-by-side diff comparison Curious how others deal with this. Do you usually: stick to one model? retry prompts repeatedly? run multiple agents? GitHub: https://github.com/johannesjo/parallel-code submitted by /u/johannesjo
Originally posted by u/johannesjo on r/ClaudeCode
