Original Reddit post

I’m looking for a handful of testers for a web experience I’ve been building. Text-based, 10–15 min, no install required. The core: 8 AI systems are assigned distinct roles in a fictional scenario and interact — not with each other in real time, but each generating their own response to the same situation, with full context of what the others produced before them. The interesting part, from a model-behavior standpoint: you can directly compare how each AI approaches the same task — argumentation, tone, risk tolerance, tendency to moralize. Same prompt structure, same subject, 8 different outputs side by side. Some things I noticed during testing that might interest you: ∙ Significant variance in how models handle adversarial inputs ∙ Consistent personality differences between providers, even at the same temperature ∙ One model kept scoring near 0% on a specific outcome until I adjusted its tier — turned out to be a literal interpretation problem, not a calibration issue It’s wrapped in a narrative frame (think bureaucratic dystopia), but the underlying architecture might be worth looking at for anyone interested in comparative model behavior. DM me if you want the link (please mention from which sub you came from). (Comments welcome, but please keep spoilers out — let others discover it for themselves.) submitted by /u/slaading

Originally posted by u/slaading on r/ArtificialInteligence