eifachposte

eifachposte

Not a paper, this is production engineering notes from the last few months of trying to unify our team’s access to openai, anthropic, and google models behind a single internal interface. We have services that need to call all three depending on the task and the question of “how do we make this not painful” turns out to be more interesting than i expected. Nothing here is novel research, but i don’t see this written up much, so. Approach 1, normalize everything to openai chat completions format. This is the de facto industry default, the openai sdk shape is the lingua franca, most observability tooling speaks it. For plain chat completions it’s fine. The cracks show up around three things specifically: Tool/function calling schemas. Anthropic’s tool_use/tool_result content blocks don’t map cleanly to openai’s tool_calls structure on the round-trip. You can flatten it, but you lose the parallel tool call semantics and the ordered content blocks claude uses internally. On our internal eval (n=80 multi-turn tool-use scenarios, scoring tool selection accuracy + argument correctness) we measured a drop from 0.87 native-claude to 0.79 when we forced the openai normalization, consistent across three runs. Small sample, not peer-reviewed, but the direction was clear enough that we stopped investing in that path. Streaming formats. Anthropic uses event-typed sse (message_start, content_block_delta, etc.), openai uses delta chunks, gemini’s streaming has its own shape. Wrappers handle the common case but the moment you need fine-grained streaming control (e.g., for tool calls in flight) the abstraction tends to leak. Safety/system controls. Gemini’s safety settings, anthropic’s system prompt handling, and openai’s developer message behavior all have subtly different semantics. “Translate everything to system role” loses information. Approach 2, keep native sdks per service, route at the application layer. Preserves full provider semantics. Cost is that you maintain three sdk integrations, three retry/timeout/auth code paths, and the routing logic becomes part of every service that needs multi-provider access. We found the maintenance burden grew faster than the feature value as we added providers. Approach 3, gateway that exposes multiple api specs natively rather than normalizing to one. Less common as a pattern. We evaluated portkey and tokenrouter squarely in this category. LiteLLM proxy mode is adjacent but not quite the same thing: its default behavior is openai-format normalization, which puts it closer to approach 1 for most usage patterns, though it can be configured for provider-native passthrough on specific routes. The appeal of the native-spec end of this space is that existing client code keeps speaking whichever sdk it was already written against. Tradeoff is that you’re now relying on the gateway to track upstream api changes, which is a real maintenance burden you’ve outsourced rather than eliminated. If the gateway falls behind on a new feature (extended thinking, computer use, structured output extensions, etc.) you’re stuck. A related question we haven’t resolved: when a primary upstream provider degrades (we got a small taste of this during a late-april anthropic capacity event), pure-proxy gateways have nowhere to fall back to within that provider. Some gateways keep their own inference capacity behind the routing layer as a last-resort path, others don’t. Whether that’s actually useful depends entirely on what models the fallback path can serve, since obviously a llama-class fallback won’t substitute for opus on the workload that needed opus in the first place. For our use case we treat it as a degraded-mode option rather than a real substitute. I don’t have a clean answer on the quality cost of approach 1 vs approach 2 at scale. Our internal eval was small enough that i wouldn’t put it in a paper, but the directional finding (measurable degradation on agentic tool-use tasks when normalizing to chat completions) was consistent enough that we decided not to ship that path for our use case. submitted by /u/OutsideFood1

Originally posted by u/OutsideFood1 on r/ArtificialInteligence

Notes on multi-provider llm api compatibility, three approaches we tried

Notes on multi-provider llm api compatibility, three approaches we tried