Original Reddit post

I built rosetta-llm — an open-source multi-format LLM proxy that acts as a drop-in Claude Code gateway.

  • Works as a Claude Code LLM gateway — set ANTHROPIC_BASE_URL and all configured models appear in /model picker
  • Translates between formats — Anthropic Messages ↔ OpenAI Chat ↔ OpenAI Responses at the wire level
  • Thinking blocks round-trip correctly — this is the hard part and why I built this
  • Provider routingopenai/gpt-5.4, anthropic/claude-opus-4-7, groq/llama-4 all through one endpoint
  • Streaming on everything — passthrough fast path + cross-format translation with proper SSE handling

The thinking-block problem

Most proxies lose reasoning continuity. LiteLLM has had open PRs for thinking block handling for a long time — some dating back months — and they’re still not merged. Without proper round-tripping, prompt caching breaks across turns and Claude Code loses context. Rosetta encodes encrypted reasoning into Anthropic’s signature field and decodes it back — so multi-turn agentic workflows keep their prompt-cache hits.

Zero-setup Hugging Face Space

Literally a two-line Dockerfile:

FROM
ghcr.io/lokesh-chimakurthi/rosetta-llm:latest
COPY --chown=app:app config.json /app/config.json

Drop config.json file and above Dockerfile into a HF Space (Docker SDK) and it’s running. No clone, no build, no venv. The GHCR image has everything baked in.

Also works with

# No install — ephemeral
uvx rosetta-llm
# Persistent install
uv tool install rosetta-llm
rosetta-llm --config ~/.rosetta-llm/config.json
# Docker
docker run -p 7860:7860 \
-v ~/.rosetta-llm/config.json:/app/config.json \
ghcr.io/lokesh-chimakurthi/rosetta-llm:latest

Why another proxy?

I looked at existing solutions:

  • LiteLLM — thinking block round-trip PRs going nowhere, too many abstractions
  • OpenRouter — great but closed-source, no self-hosting
  • Direct passthrough proxies — don’t translate between formats Nothing gave me lossless cross-format translation with proper reasoning fidelity.

Links

Contributions welcome

I built this for myself and it works for my use cases. But there’s a lot more it could do — better multimodal handling, embeddings support, rate limiting, an admin UI. If any of this sounds interesting, PRs are absolutely welcome. Happy to answer questions in the comments. submitted by /u/DataNebula

Originally posted by u/DataNebula on r/ClaudeCode