Original Reddit post

Disclosure: I’m the sole developer of vexp. It’s a commercial product. No referral links in this post, just a direct link to the site. I’m posting because people asked for benchmark data after my first post . What vexp does: It’s an MCP server that pre-indexes your codebase into a dependency graph (tree-sitter + SQLite, runs 100% locally). Instead of Claude exploring your project file-by-file with Read/Grep/Glob, vexp returns the relevant code in a single run_pipeline call — graph-ranked, with full content for pivot nodes and compact skeletons for supporting code. Who benefits: developers on API billing working on medium-to-large codebases (say 50+ files) where Claude burns tokens exploring irrelevant code. It won’t help on small projects or single-file edits. The benchmark: I ran it on FastAPI v0.115.0 — the actual open-source repo, ~800 Python files. 7 tasks (bug fixes, features, refactors, code understanding), 3 runs per task per arm, 42 total executions. Claude Sonnet 4.6. Both arms in full isolation with –strict-mcp-config , collected via headless claude -p . Total across 42 runs: $16.29 baseline vs $6.89 with vexp. What surprised me: The output token drop. 504 → 189 means Claude isn’t just reading less — it’s generating less irrelevant output too. When the input context is focused, the responses get focused. I didn’t explicitly design for that. Without vexp, Claude loads ~40K+ tokens of context through incremental file reads. With vexp, it gets ~8K tokens of graph-ranked context in one shot. Per-task breakdown: Code understanding and refactoring benefit the most (−57% to −73%). Bug fixes the least (−30%) — less wasted exploration to cut when the problem is localized. Setup: json { “mcpServers”: { “vexp”: { “command”: “vexp”, “args”: [“mcp”] } } } Add to ~/.claude/settings.json , then vexp index in your project root. Next session picks it up. What changed since my last post: Session memory is now linked to the code graph. When you start a new Claude Code session, vexp remembers what you explored last time. When the underlying code changes, stale memories get flagged automatically. This ended up saving me more time day-to-day than the token reduction. Site: vexp.dev Happy to answer questions about the methodology or architecture. If you run it on your own codebase I’d be curious what numbers you see — especially on repos larger than FastAPI. submitted by /u/Objective_Law2034

Originally posted by u/Objective_Law2034 on r/ClaudeCode