Claude Code sessions cost real money. Significant chunk of that usage is mechanical: classify files, format JSON, turn notes into tables, extract fields, populate templates. None of it needs Sonnet. All of it costs the same. So I built a one-tool MCP server that offloads that class of work to DeepSeek. Claude keeps the hard problems. DeepSeek handles the rote work at a fraction of the cost. One tool: deepseek(prompt, system?, model?) . It takes text, returns text. stdio transport, no daemon, no database, no local server. Every response comes back with metadata: model, latency, token count. So you see what you paid for each call. Cost, with real numbers: Flash (default) handles most mechanical tasks. Pro for when you need a bit more. A hundred small tasks: $0.03. That is the entire utility proposition. Why DeepSeek V4 Flash specifically: I picked it after running a blind cross-model benchmark in OpenCode some time ago. Seven frontier models ranked each other on a fixed framework: intelligence, reasoning, openness, ecosystem, context, speed, and price. Two independent arbiters reconciled the results. Flash ranked first. Not because it is the smartest (Pro held that) but because 1M context, MIT open weights, and sub-$0.30/M input add up before any benchmark data enters. For mechanical work, the cost and context gap matters more than the reasoning gap. What lands on DeepSeek now: “Extract all TODOs from this 400-line file and group them by owner.” “Classify these 200 filenames into doc/code/config. Mark uncertainty. Return JSON only.” “Turn this rough meeting note into a clean CSV with date, owner, action.” “Summarize this 12-page packet for review. Do not propose decisions, do not evaluate. Just condense.” What stays on Claude: architecture decisions, security review, anything a human will read, judgment calls where the hard part is picking what matters. Six independent validation runs across two different task families, zero factual errors. MCP inline and OpenCode/relay-style output were factually equivalent. The difference is annotation depth, not accuracy. Caveats: It is a supervised worker. No tool calls, no file operations, no multi-step chains. Latency varies: a 100-line classification might return in 3 seconds, a large summarization prompt might take 25. The output needs review, same as any LLM. And it only works with MCP-compatible clients (Claude Code, Codex). Setup: bash pip install “git+https://github.com/arizen-dev/deepseek-mcp.git” export DEEPSEEK_API_KEY=“sk-…” Add to .mcp.json , restart Claude Code, /mcp confirms the server. Config examples for both Claude Code and Codex in the README. Repo: https://github.com/arizen-dev/deepseek-mcp (MIT, Python 3.10+, single dep: openai ) submitted by /u/petburiraja
Originally posted by u/petburiraja on r/ClaudeCode
