Original Reddit post

https://preview.redd.it/05xhubaufmpg1.png?width=1380&format=png&auto=webp&s=4813fedca619441002f4c86c87edf95b4828e687

The problem every web dev hits

You’re 2 hours into a debugging session. Claude hits its hourly limit. You go to the dashboard, swap API keys, reconfigure your IDE. Flow destroyed. The frustrating part: there are great free AI tiers most devs barely use:

  • Kiro → full Claude Sonnet 4.5 + Haiku 4.5, unlimited, via AWS Builder ID (free)
  • iFlow → kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax (unlimited via Google OAuth)
  • Qwen → 4 coding models, unlimited (Device Code auth)
  • Gemini CLI → gemini-3-flash, gemini-2.5-pro (180K tokens/month)
  • Groq → ultra-fast Llama/Gemma, 14.4K requests/day free
  • NVIDIA NIM → 70+ open-weight models, 40 RPM, forever free But each requires its own setup, and your IDE can only point to one at a time.

What I built to solve this

OmniRoute — a local proxy that exposes one localhost:20128/v1 endpoint. You configure all your providers once, build a fallback chain (“Combo”), and point all your dev tools there. My “Free Forever” Combo:

  1. Gemini CLI (personal acct) — 180K/month, fastest for quick tasks ↕ distributed with 1b. Gemini CLI (work acct) — +180K/month pooled ↓ when both hit monthly cap
  2. iFlow (kimi-k2-thinking — great for complex reasoning, unlimited) ↓ when slow or rate-limited
  3. Kiro (Claude Sonnet 4.5, unlimited — my main fallback) ↓ emergency backup
  4. Qwen (qwen3-coder-plus, unlimited) ↓ final fallback
  5. NVIDIA NIM (open models, forever free) OmniRoute distributes requests across your accounts of the same provider using round-robin or least-used strategies. My two Gemini accounts share the load — when the active one is busy or nearing its daily cap, requests shift to the other automatically. When both hit the monthly limit, OmniRoute falls to iFlow (unlimited). iFlow slow? → routes to Kiro (real Claude). Your tools never see the switch — they just keep working.

Practical things it solves for web devs

Rate limit interruptions → Multi-account pooling + 5-tier fallback with circuit breakers = zero downtime Paying for unused quota → Cost visibility shows exactly where money goes; free tiers absorb overflow Multiple tools, multiple APIs → One localhost:20128/v1 endpoint works with Cursor, Claude Code, Codex, Cline, Windsurf, any OpenAI SDK Format incompatibility → Built-in translation: OpenAI ↔ Claude ↔ Gemini ↔ Ollama, transparent to caller Team API key management → Issue scoped keys per developer, restrict by model/provider, track usage per key [IMAGE: dashboard with API key management, cost tracking, and provider status]

Already have paid subscriptions? OmniRoute extends them.

You configure the priority order: Claude Pro → when exhausted → DeepSeek native ($0.28/1M) → when budget limit → iFlow (free) → Kiro (free Claude) If you have a Claude Pro account, OmniRoute uses it as first priority. If you also have a personal Gemini account, you can combine both in the same combo. Your expensive quota gets used first. When it runs out, you fall to cheap then free. The fallback chain means you stop wasting money on quota you’re not using.

Quick start (2 commands)

npm install -g omniroute
omniroute

Dashboard opens at http://localhost:20128/. Go to Providers → connect Kiro (AWS Builder ID OAuth, 2 clicks) Connect iFlow (Google OAuth), Gemini CLI (Google OAuth) — add multiple accounts if you have them Go to Combos → create your free-forever chain Go to Endpoints → create an API key Point Cursor/Claude Code to localhost:20128/v1 Also available via Docker (AMD64 + ARM64) or the desktop Electron app (Windows/macOS/Linux).

What else you get beyond routing

  • 📊 Real-time quota tracking — per account per provider, reset countdowns
  • 🧠 Semantic cache — repeated prompts in a session = instant cached response, zero tokens
  • 🔌 Circuit breakers — provider down? <1s auto-switch, no dropped requests
  • 🔑 API Key Management — scoped keys, wildcard model patterns (claude/*, openai/*), usage per key
  • 🔧 MCP Server (16 tools) — control routing directly from Claude Code or Cursor
  • 🤖 A2A Protocol — agent-to-agent orchestration for multi-agent workflows
  • 🖼️ Multi-modal — same endpoint handles images, audio, video, embeddings, TTS
  • 🌍 30 language dashboard — if your team isn’t English-first GitHub: https://github.com/diegosouzapw/OmniRoute Free and open-source (GPL-3.0).
## 🔌 All 50+ Supported Providers
### 🆓 Free Tier (Zero Cost, OAuth)
### 🔐 OAuth Subscription Providers (CLI Pass-Through)
> These providers work as **subscription proxies**  OmniRoute redirects your existing paid CLI subscriptions through its endpoint, making them available to all your tools without reconfiguring each one.
### 🔑 API Key Providers (Pay-Per-Use + Free Tiers)
---
## 🛠️ CLI Tool Integrations (14 Agents)
OmniRoute integrates with 14 CLI tools in **two distinct modes**:
### Mode 1: Redirect Mode (OmniRoute as endpoint)
Point the CLI tool to `localhost:20128/v1`  OmniRoute handles provider routing, fallback, and cost. All tools work with zero code changes.
### Mode 2: Proxy Mode (OmniRoute uses CLI as a provider)
OmniRoute connects to the CLI tool's running subscription and uses it as a provider in combos. The CLI's paid subscription becomes a tier in your fallback chain.
**Multi-account:** Each subscription provider supports up to 10 connected accounts. If you and 3 teammates each have Claude Code Pro, OmniRoute pools all 4 subscriptions and distributes requests using round-robin or least-used strategy.
---
**GitHub:**
https://github.com/diegosouzapw/OmniRoute
Free and open-source (GPL-3.0).

submitted by /u/ZombieGold5145

Originally posted by u/ZombieGold5145 on r/ArtificialInteligence