Original Reddit post

I work on a distributed backend system split across multiple microservices in separate repos. Understanding how a failure propagates across services is non-trivial even for experienced team members. I’ve been using Claude Code with context files describing each service’s role, key code paths, and gotchas. It’s been surprisingly useful for ad-hoc questions. But I want something more structured for the whole team. The goals are: Failure diagnosis — given an error or stuck state, identify where things broke and the likely cause Codebase onboarding — new engineers ask “how does X work” and get accurate answers grounded in actual code, not outdated docs Design questions — “why does service A call B instead of C”, “what would break if we changed this interface” What I’m trying to figure out:

  • Is a well-maintained context file + Claude Code essentially the ceiling, or is there a meaningfully better setup — RAG over the codebase, a code index, custom tooling?
  • For failure diagnosis specifically: is there value in feeding structured schemas or state machine definitions upfront, vs letting the model find them on demand?
  • Anyone running something like this for a team (not just personal use)? How do you keep context accurate as the codebase evolves? Curious what’s worked for people with genuinely complex, multi-service systems rather than a single-repo app. submitted by /u/Luminancee

Originally posted by u/Luminancee on r/ClaudeCode