Original Reddit post

I’ve been building an open source codebase intelligence tool that exposes 9 MCP tools to Claude Code. One of the layers scores every file 1-10 using 15 deterministic biomarkers: complexity, nesting, duplication, test coverage gaps, ownership patterns, co-change coupling. Pure AST parsing via tree-sitter + git history. No LLM calls for the analysis itself The idea: Claude Code is good at writing and refactoring code, but it doesn’t know which files need attention. This layer tells it. One MCP call and Claude gets a ranked list of refactoring targets with specific findings per file. To make sure the scores aren’t noise, I ran a time-travel experiment on FastAPI, Pydantic, and Django: score every file, then count bug-fix commits over the next 6 months. On Django (542 files), 14 of the 20 worst-scoring files had real bugs. Correlation was statistically significant across all three repos (p < 0.0001 on Django). Top predictors were untested hotspots and developer congestion, not complexity metrics. Keeping this layer deterministic was a deliberate choice. Health scores need to be reproducible and fast. Run it once, pipe the results to Claude, let Claude handle the reasoning about what to fix and how. If you want to run on your own codebase: GitHub: https://github.com/repowise-dev/repowise AGPL-3.0, runs locally submitted by /u/Obvious_Gap_5768

Originally posted by u/Obvious_Gap_5768 on r/ClaudeCode