I’m curious if anyone here actually chooses AI models based on benchmark charts like the one from Artificial Analysis: https://artificialanalysis.ai/models#intelligence I’d love to hear your honest opinions, because I’ve noticed something interesting that models with high scores don’t always perform well in practice (or am I doing it wrong?). For example, I asked several AI models to generate a study plan for a complete beginner who wants to build strong foundational skills in networking. Some of the responses felt very generic and average. In my experience, Gemini and Perplexity were average to below average, while a few others performed noticeably better. Also, is it just me, or have models like Kimi ( www.kimi.com ) and Xiaomimimo ( www.mimo.mi.com ) improved a lot recently? I’ve seen a few posts about Kimi on reddit, which made me curious. Personally, Xiaomimimo has been giving me the best results lately, especially for structured study plans and more personalized tasks. So, I’m wondering, do you choose AI tools based on benchmark scores, or do you rely more on real-world performance and personal testing? submitted by /u/pastaphome
Originally posted by u/pastaphome on r/ArtificialInteligence
