Original Reddit post

Classic wall: my retrieval pipeline was great at “what’s our refund policy” and useless at anything relational or time-based, e.g “why did we switch vendors,” “how did this decision evolve.” Embeddings flatten exactly the causal and temporal structure those questions need, so the agent confidently returned nonsense. I’d read a good write-up on why RAG breaks this way, and instead of letting it rot in a bookmark, I turned it into a reusable Claude Code skill. The result is a skill called diagnosing-rag-failure-modes, and the outcome is the part I care about: I hand Claude a failing query and it returns a structured failure classification + the specific architecture fix, not vibes. It buckets the failure into one of four patterns and prescribes the matching fix:

  • multi-hop relational failure → knowledge graph
  • temporal sequencing failure → timeline index
  • organizational context failure → structured provenance ingestion
  • scale failure → tiered retrieval So “my RAG is bad” becomes “this is a temporal sequencing failure, add a timeline index.” That’s a diagnosis I can act on. The part that surprised me was how little I did to create it. I have Loreto wired in as an MCP server, so I just told Claude, in my editor: “Use Loreto to extract skills from <the article URL>” Claude called the generate_skills tool, got back the full skill package, and wrote it straight into .claude/skills/. No web UI, no copy-paste, no “now go format this into a SKILL.md.” I told it to turn the literature into a skill, and it did. And the skill it produced isn’t a summary blob — that’s the whole point. It comes out following actual skill-authoring best practices, with the artifacts that make a skill usable instead of decorative: • SKILL.md with named failure modes and the causal mechanism for each (why it happens, not “it’s complex”), plus a Mermaid diagram showing the wrong vs. right architecture side by side • reference files with real, runnable content — actual graph-query templates and schemas, not placeholders • a runnable test that checks concrete behavior • a README for context Skill here: https://loreto.io/skills/diagnosing-rag-failure-modes That bundle is what makes Claude actually apply it correctly on the next task instead of re-deriving the same reasoning (and burning the same tokens) every time. I’m currently building Loreto https://loreto.io/ in public. Genuinely curious and would love to learn for those of you running RAG in prod, how are you handling the relational/temporal queries embeddings can’t? Knowledge graph, hybrid layers, something else? And is anyone else turning the good articles they read into reusable agent skills, or still letting them die in a bookmarks folder? submitted by /u/Classic_Display9788

Originally posted by u/Classic_Display9788 on r/ClaudeCode