Original Reddit post

Tool: https://graperoot.dev/ Explore the website, it even has playground if your brain feels fatigue after seeing benchmarks :) I have been using Claude Code on large repos (10K to 17K files) and kept noticing the same issue. It spends most of its time just finding files instead of solving the task. On Sentry’s repo (~17.6K files), a single prompt takes ~5.6 minutes, costs ~$1.22, and opens 40–50 files to use maybe 3–5. Roughly 60% of tokens go to irrelevant context. So I stopped trying to prompt better and fixed retrieval instead. I built a small MCP server that pre-indexes file relationships (imports, references) and uses BM25 to rank files before the model runs. One-time scan ~30 seconds, after that every prompt starts with the right context instead of grep wandering. I ran a blind test (same model, same prompts, LLM judge scoring): ┌────────────────────┬─────────────┬───────────────┐ │ │ GrapeRoot │ Normal Claude │ ├────────────────────┼─────────────┼───────────────┤ │ Avg Quality │ 82.0 │ 64.6 │ │ Avg Cost/Prompt │ $0.71 │ $1.22 │ │ Avg Time │ 2.2 min │ 5.6 min │ │ Win Rate │ 100% │ 0% │ └────────────────────┴─────────────┴───────────────┘ The biggest difference showed up in a security audit. Both runs cost about the same, but mine explored 40+ files across packages and found a real vulnerability with a fix. Default Claude stayed in one directory, checked a few files, and missed it. This is not a model problem. It is a context problem. Right now, a big chunk of tokens is wasted on figuring out where to look. If you remove that, all tokens go into actual reasoning and output quality jumps. Stack is simple: MCP + BM25 + file graph, fully local, no embeddings, no vector DB. Tested across 7 repos (Python, TS, Go, Rust, Java, C++), same pattern everywhere. Honest take: if you are working on non-trivial repos, you are probably burning 50–70% of tokens on bad retrieval without realizing it. submitted by /u/intellinker

Originally posted by u/intellinker on r/ArtificialInteligence