A 1B model scoring 40.42 on AIME 2025 should not be possible. AIME is the American Invitational Mathematics Examination, the kind of test that filters out most humans who attempt it. Qwen3-0.6B scores 16.25 on the same benchmark. LFM2.5-1.2B, a larger model, scores 31.88. MiniCPM5-1B, at roughly one billion parameters, beats both. OpenBMB just dropped MiniCPM5-1B, the first model in their MiniCPM5 series, and it's built specifically for the scenarios like on-device deployment, resource-constrained environments, local inference on consumer hardware. The AIME score is surprising. The telecom agent benchmark is even more surprising. And then there's the desktop pet. We'll get to that.