Original Reddit post

I spent three weeks testing Xiaomi’s MiMo V2.5 Pro as a fully autonomous coding agent. Not running benchmarks. Actually building a product with it over extended sessions. Xiaomi open-sourced the full model (1.02T params, MIT license). Here’s what the data shows and what the open-source release means. What I tested I connected V2.5 Pro to Claude Code using Xiaomi’s Anthropic-compatible API endpoint. Then I ran autonomous sessions where the model comes up with its own tasks, prioritizes them, writes code, commits to git, and moves on. No human intervention during sessions. The model created the product idea, the backlog, the architecture, and the content strategy on its own. The numbers After ~125 sessions: - 301 git commits - 60+ pages (landing pages, pricing, blog posts, tools, API docs) - Interactive API cost calculator with real-time pricing across 33 models and 10 providers - Stripe checkout, embeddable widget system, price alert system - Newsletter infrastructure, serverless API endpoints - Full site deployed on Vercel from an empty repo - Total API cost: $70.12 Why the cost is so low The key finding: 96% cache hit rate. Out of 387 million tokens processed, 373 million were cache hits. Claude Code reuses context between tool calls (reading files, checking outputs, making edits), and V2.5 Pro’s caching means you’re paying almost nothing for repeated context. $70.12 for 387 million tokens. What it’s good at Autonomous task planning and execution. This model doesn’t just follow instructions. It creates its own backlog, prioritizes tasks, and works through them across sessions. It also self-corrects: it ran quality audits on its own code and fixed bugs it found without being asked. For structured coding work (static sites, serverless functions, SEO content, API endpoints), it’s fast and reliable. Every structured coding task I threw at it over 125 sessions, it handled. That covers probably 95% of what most developers actually need day to day. Where it struggles Rate limits during heavy usage (flow control on RPM/TPM). I hit these a few times during extended sessions. After about a minute of waiting, sessions continued normally. Not a dealbreaker. For the hardest reasoning tasks (complex multi-step architecture decisions, subtle cross-file bugs), I can’t say how it compares to Claude Opus or GPT-5.4 because I didn’t test those edge cases. The open-source release V2.5 Pro is fully open-source under MIT license. The full specs: 1.02 trillion total parameters, 42B active (MoE), hybrid attention architecture, 3-layer Multi-Token Prediction, up to 1M context. Weights are on HuggingFace. The catch: at 1.02T parameters, self-hosting requires serious hardware (think 4x A100 80GB minimum). For most developers, the API with its 96% cache hit rate is still the most practical path. But the open weights mean enterprises can deploy on their own infrastructure, and researchers can fine-tune for specific domains. The ecosystem play V2.5 Pro works natively with 11+ coding tools: Claude Code, OpenCode, Cline, Kilo Code, Roo Code, Codex, Cherry Studio, Zed, Qwen Code, Trae, and OpenClaw. The previous generation needed proxy setups. This native Anthropic-compatible endpoint is what makes it practical for daily use. The bigger picture A trillion-parameter model, open-sourced under MIT license, that costs $70 for 387 million tokens via API. A year ago this would have been unthinkable. Whether you use the API or self-host, V2.5 Pro is a sign of where the economics of AI are heading.I spent three weeks testing Xiaomi’s MiMo V2.5 Pro as a fully autonomous coding agent. Not running benchmarks. Actually building a product with it over extended sessions. Xiaomi open-sourced the full model (1.02T params, MIT license). Here’s what the data shows and what the open-source release means. What I tested I connected V2.5 Pro to Claude Code using Xiaomi’s Anthropic-compatible API endpoint. Then I ran autonomous sessions where the model comes up with its own tasks, prioritizes them, writes code, commits to git, and moves on. No human intervention during sessions. The model created the product idea, the backlog, the architecture, and the content strategy on its own. The numbers After ~125 sessions: - 301 git commits - 60+ pages (landing pages, pricing, blog posts, tools, API docs) - Interactive API cost calculator with real-time pricing across 33 models and 10 providers - Stripe checkout, embeddable widget system, price alert system - Newsletter infrastructure, serverless API endpoints - Full site deployed on Vercel from an empty repo - Total API cost: $70.12 Why the cost is so low The key finding: 96% cache hit rate. Out of 387 million tokens processed, 373 million were cache hits. Claude Code reuses context between tool calls (reading files, checking outputs, making edits), and V2.5 Pro’s caching means you’re paying almost nothing for repeated context. $70.12 for 387 million tokens. What it’s good at Autonomous task planning and execution. This model doesn’t just follow instructions. It creates its own backlog, prioritizes tasks, and works through them across sessions. It also self-corrects: it ran quality audits on its own code and fixed bugs it found without being asked. For structured coding work (static sites, serverless functions, SEO content, API endpoints), it’s fast and reliable. Every structured coding task I threw at it over 125 sessions, it handled. That covers probably 95% of what most developers actually need day to day. Where it struggles Rate limits during heavy usage (flow control on RPM/TPM). I hit these a few times during extended sessions. After about a minute of waiting, sessions continued normally. Not a dealbreaker. For the hardest reasoning tasks (complex multi-step architecture decisions, subtle cross-file bugs), I can’t say how it compares to Claude Opus or GPT-5.4 because I didn’t test those edge cases. The open-source release V2.5 Pro is fully open-source under MIT license. The full specs: 1.02 trillion total parameters, 42B active (MoE), hybrid attention architecture, 3-layer Multi-Token Prediction, up to 1M context. Weights are on HuggingFace. The catch: at 1.02T parameters, self-hosting requires serious hardware (think 4x A100 80GB minimum). For most developers, the API with its 96% cache hit rate is still the most practical path. But the open weights mean enterprises can deploy on their own infrastructure, and researchers can fine-tune for specific domains. The ecosystem play V2.5 Pro works natively with 11+ coding tools: Claude Code, OpenCode, Cline, Kilo Code, Roo Code, Codex, Cherry Studio, Zed, Qwen Code, Trae, and OpenClaw. The previous generation needed proxy setups. This native Anthropic-compatible endpoint is what makes it practical for daily use. The bigger picture A trillion-parameter model, open-sourced under MIT license, that costs $70 for 387 million tokens via API. A year ago this would have been unthinkable. Whether you use the API or self-host, V2.5 Pro is a sign of where the economics of AI are heading. submitted by /u/jochenboele

Originally posted by u/jochenboele on r/ArtificialInteligence