I built Axion AI and want to share the technical approach since I learned a lot from this community. The problem I was solving: Running evals or building apps across multiple LLM providers means dealing with different SDKs, auth systems, and response formats. I wanted a single normalized interface. How it works: The core is a PHP routing layer that maps OpenAI-style requests to each provider’s native format. When you send a request to /v1/chat/completions, it: Validates your API key and checks credit balance Maps the model name (e.g. “anthropic/claude-opus-4”) to the provider’s internal model ID Forwards the request to DigitalOcean’s Gradient inference API Normalizes the response back to OpenAI format Tracks token usage and calculates credits using per-model rates Credit calculation: Each model has different input/output rates. I store them as credits-per-1K-tokens and apply a ~40/60 input/output split since most chat completions skew toward longer outputs. Rate limiting: Uses a sliding window stored per API key — timestamps of recent requests are stored as a comma-separated string, pruned on each request to only keep the last 60 seconds. Limitations I’m still working on:
- No streaming support yet
- Token split is estimated, not exact
- Single upstream provider (DO Gradient) so model availability depends on them Models currently supported: GPT-4o, Claude Opus/Sonnet/Haiku, Llama 3.3 70B, DeepSeek R1, Qwen 3 32B, Mistral Nemo, NVIDIA Nemotron 120B, and more. Demo: https://axion.mikedev.site/ Docs: https://axion.mikedev.site/docs Happy to discuss the architecture or any of the tradeoffs I made. submitted by /u/Mikeeeyy04
Originally posted by u/Mikeeeyy04 on r/ArtificialInteligence
