Genuine question because I keep running into it. Was helping a friend debug their agent stack last week. Three different provider SDKs imported directly. Retry logic in five files. A try/except block doing what looked like a poor man’s fallback to a different model. This is at a seed funded startup. I know everyone reading this knows what an LLM gateway is. The pitch hasn’t changed in two years. Unified API, fallback, caching, cost tracking, virtual keys, observability. Same talking points across Bifrost, LiteLLM, Kong AI Gateway, Cloudflare AI Gateway, take your pick. But the cost case has actually shifted under us and I don’t see people talking about it. We pulled 30 days of our agent traffic at my last check. Stuff that gateways now solve out of the box that we were hand-rolling: Semantic caching cut our token spend by ~31% on a customer support agent. Repetitive queries we were billing for every single time. Fallback config replaced ~400 lines of provider-specific retry code. We hadn’t deleted the old code yet but we will. Per-team virtual keys finally let our finance person stop asking me which prompt cost $1,800 last Tuesday. If you’re 6+ months in and still calling provider SDKs directly, you’re paying for that decision in token spend and on-call pages. Should have moved earlier honestly. submitted by /u/Otherwise_Flan7339
Originally posted by u/Otherwise_Flan7339 on r/ArtificialInteligence
