Been trying to hook Claude Code to NVIDIA NIM models through the fcc-server/free-claude-code proxy method and honestly the latency is killing me. https://github.com/Alishahryar1/free-claude-code?utm_source=chatgpt.com What I basically did: • edited .fcc/.env • changed model routing to DeepSeek / Kimi / GLM etc • changed config.json • changed .claude/settings.json • ran fcc-server • pointed Claude to 127.0.0.1:8082 And technically it works. Claude UI opens, models switch properly, requests go through etc. But the actual experience is rough. Kimi randomly slows down. DeepSeek v4 Pro takes forever. DeepSeek Flash literally churned for 5 mins on just “hi” and then threw provider API failed. Qwen also feels slower than expected inside Claude Code. I also noticed Claude web search/tool stuff doesn’t properly work with NVIDIA upstreams because of Anthropic tools incompatibility. Is there a smoother / cleaner / lower latency way to use non-Anthropic models inside Claude Code? Like: • OpenRouter? • LiteLLM? • some other proxy? • direct MCP style setup? • different frontend entirely? Main thing I want: Claude Code style workflow + fast coding model + decent tool use without these insane delays. Anyone here got a stable setup? submitted by /u/OkPaleontologist9770
Originally posted by u/OkPaleontologist9770 on r/ClaudeCode
