Original Reddit post

Don’t treat Claude Code like a smarter chatbot. It isn’t. The failures that accumulate over time, drifting context, degrading output quality, and rules that get ignored aren’t model failures. They’re architecture failures. Fix the architecture, and the model mostly takes care of itself. think about Claude Code as six layers : context , skills , tools and Model Context Protocol servers , hooks , subagents , and verification . Neglect any one of them and it creates pressure somewhere else. The layers are load-bearing. The execution model is a loop, not a conversation. Gather context → Take action → Verify result → [Done or loop back] ↑ ↓ CLAUDE.md Hooks / Permissions / Sandbox Skills Tools / MCP Memory Wrong information in context causes more damage than missing information. The model acts confidently on bad inputs. And without a verification step, you won’t know something went wrong until several steps later when untangling it is expensive. The 200K context window sounds generous until you account for what’s already eating it. A single Model Context Protocol server like GitHub exposes 20-30 tool definitions at roughly 200 tokens each. Connect five servers and you’ve burned ~25,000 tokens before sending a single message. Then the default compression algorithm quietly drops early tool outputs and file contents — which often contain architectural decisions you made two hours ago. Claude contradicts them and you spend time debugging something that was never a model problem. The fix is explicit compression rules in CLAUDE.md:

Compact Instructions When compressing, preserve in priority order: 1. Architecture decisions (NEVER summarize) 2. Modified files and their key changes 3. Current verification status (pass/fail) 4. Open TODOs and rollback notes 5. Tool outputs (can delete, keep pass/fail only)

Before ending any significant session, I have Claude write a HANDOFF.md — what it tried, what worked, what didn’t, what should happen next. The next session starts from that file instead of depending on compression quality. Skills are the piece most people either skip or implement wrong. A skill isn’t a saved prompt. The descriptor stays resident in context permanently; the full body only loads when the skill is actually invoked. That means descriptor length has a real cost, and a good description tells the model when to use the skill, not just what’s in it.

Inefficient (~45 tokens) description: | This skill helps you review code changes in Rust projects. It checks for common issues like unsafe code, error handling… Use this when you want to ensure code quality before merging. # Efficient (~9 tokens) description: Use for PR reviews with focus on correctness.

Skills with side effects — config migrations, deployments, anything with a rollback path — should always disable model auto-invocation. Otherwise the model decides when to run them. Hooks are how you move decisions out of the model entirely. Whether formatting runs, whether protected files can be touched, whether you get notified after a long task — none of that should depend on Claude remembering. For a mixed-language project, hooks trigger separately by file type: “hooks”: { “PostToolUse”: [ { “matcher”: “Edit”, “pattern”: “*.rs”, “hooks”: [{ “type”: “command”, “command”: "cargo check 2>&1 head -30", “statusMessage”: “Checking Rust…” ] }, “matcher”: “Edit”, “pattern”: “*.lua”, “hooks”: [{ “type”: “command”, “command”: "luajit -b $FILE /dev/null 2>&1 head -10", “statusMessage”: “Checking Lua syntax…” ] } ] } } Finding a compile error on edit 3 is much cheaper than finding it on edit 40. In a 100-edit session, 30-60 seconds saved per edit adds up fast. Subagents are about isolation, not parallelism. A subagent is an independent Claude instance with its own context window and only the tools you explicitly allow. Codebase scans and test runs that generate thousands of tokens of output go to a subagent. The main thread gets a summary. The garbage stays contained. Never give a subagent the same broad permissions as the main thread — that defeats the entire point. Prompt caching is the layer nobody talks about, and it shapes everything above it. Cache hit rate directly affects cost, latency, and rate limits. The cache works by prefix matching, so order matters:

  1. System Prompt → Static, locked 2. Tool Definitions → Static, locked 3. Chat History → Dynamic, comes after 4. Current user input → Last Putting timestamps in the system prompt breaks caching on every request. Switching models mid-session is more expensive than staying on the original model because you rebuild the entire cache from scratch. If you need to switch, do it via subagent handoff. Verification is the layer most people skip entirely. “Claude says it’s done” has no engineering value. Before handing anything to Claude for autonomous execution, define done concretely:

Verification For backend changes: - Run make test and make lint - For API changes, update contract tests under tests/contracts/ Definition of done: - All tests pass - Lint passes - No TODO left behind unless explicitly tracked

The test I keep coming back to: if you can’t describe what a correct result looks like before Claude starts, the task isn’t ready. A capable model with no acceptance criteria still has no reliable way to know when it’s finished. The control stack that actually holds is three layers working together. CLAUDE.md states the rule. The skill defines how to execute it. The hook enforces it on critical paths. Any single layer has gaps. All three together close them. Here’s a Full breakdown covering context engineering, skill and tool design, subagent configuration, prompt caching architecture, and a complete project layout reference. submitted by /u/HuckleberryEntire699

Originally posted by u/HuckleberryEntire699 on r/ClaudeCode