Original Reddit post

been rotating through 5 chinese coding models on a TS/Next codebase for the last 4-5 weeks. Kimi K2.6, GLM-5.1, MiMo V2.5 Pro, MiniMax 2.7, DeepSeek V4 Pro. wanted to share where i landed and ask about M3. quick per-category from my runs: Frontend / design → K2.6 Backend → K2.6 and GLM-5.1 Code review → MiMo All-rounder → M2.7 Reasoning-heavy → DeepSeek afterwards i found llmdevguy posted a near-identical ranking on X a couple weeks back (162k views, 2.3k likes) and ended it with “now i’m waiting for MiniMax 3.0 to take the number 1 spot.” weird to land in the exact same place. https://preview.redd.it/01k9njcpmo2h1.png?width=1190&format=png&auto=webp&s=ef920c65d32a34f1dc054718813d3bb57b54037e M2.7 didn’t win any single category for me. what surprised me is cost. Kilo Code posted a benchmark on ClaudeAI: M2.7 hit ~90% of Opus 4.6 quality at ~7% of the cost ($0.27 vs $3.67 across three coding tasks). my own runs aren’t scientific but the ratio tracks. short version of the shortcomings: thinner tests and it jumps straight to code instead of walking through reasoning. so i reach for it as an executor once a stronger model has planned, not as the planner. real question is whether M3 closes the planning and test-coverage gap. if it does, all-rounder becomes top of every category pretty fast. anyone else doing side-by-side runs? does this hold on python / go / rust or is it a TS thing? submitted by /u/davilucas1978

Originally posted by u/davilucas1978 on r/ArtificialInteligence