I built rosetta-llm — an open-source multi-format LLM proxy that acts as a drop-in Claude Code gateway.
- Works as a Claude Code LLM gateway — set
ANTHROPIC_BASE_URLand all configured models appear in/modelpicker - Translates between formats — Anthropic Messages ↔ OpenAI Chat ↔ OpenAI Responses at the wire level
- Thinking blocks round-trip correctly — this is the hard part and why I built this
- Provider routing —
openai/gpt-5.4,anthropic/claude-opus-4-7,groq/llama-4all through one endpoint - Streaming on everything — passthrough fast path + cross-format translation with proper SSE handling
The thinking-block problem
Most proxies lose reasoning continuity. LiteLLM has had open PRs for thinking block handling for a long time — some dating back months — and they’re still not merged. Without proper round-tripping, prompt caching breaks across turns and Claude Code loses context.
Rosetta encodes encrypted reasoning into Anthropic’s signature field and decodes it back — so multi-turn agentic workflows keep their prompt-cache hits.
Zero-setup Hugging Face Space
Literally a two-line Dockerfile:
FROM
ghcr.io/lokesh-chimakurthi/rosetta-llm:latest
COPY --chown=app:app config.json /app/config.json
Drop config.json file and above Dockerfile into a HF Space (Docker SDK) and it’s running. No clone, no build, no venv. The GHCR image has everything baked in.
Also works with
# No install — ephemeral
uvx rosetta-llm
# Persistent install
uv tool install rosetta-llm
rosetta-llm --config ~/.rosetta-llm/config.json
# Docker
docker run -p 7860:7860 \
-v ~/.rosetta-llm/config.json:/app/config.json \
ghcr.io/lokesh-chimakurthi/rosetta-llm:latest
Why another proxy?
I looked at existing solutions:
- LiteLLM — thinking block round-trip PRs going nowhere, too many abstractions
- OpenRouter — great but closed-source, no self-hosting
- Direct passthrough proxies — don’t translate between formats Nothing gave me lossless cross-format translation with proper reasoning fidelity.
Links
- GitHub: https://github.com/Lokesh-Chimakurthi/rosetta-llm
- PyPI: https://pypi.org/project/rosetta-llm/
Contributions welcome
I built this for myself and it works for my use cases. But there’s a lot more it could do — better multimodal handling, embeddings support, rate limiting, an admin UI. If any of this sounds interesting, PRs are absolutely welcome. Happy to answer questions in the comments. submitted by /u/DataNebula
Originally posted by u/DataNebula on r/ClaudeCode
