Most people treat a SKILL.md like a long prompt the agent reads. It’s actually a loader specification - understanding the difference cuts context cost by 3x without changing a single instruction. There are three loading levels. Frontmatter (name and description) is always in context, every single turn, whether the skill is relevant or not - about 100 tokens per installed skill. The body loads only when the agent decides to invoke the skill. References and scripts load only when the body explicitly points to them. Most skill files pack everything into the body. A 1,200-line monolith means every trigger loads all of it - 20% of the context window before the agent does any work. Refactored as a 180-line spine pointing to three reference files, the agent loads each one only when the current task actually needs it: 7% context cost. Same instructions, same output, 3x cheaper. The savings compound. A skill at 7% instead of 20% lets you install three in the same budget, run longer sessions before compaction, and hit fewer context cliffs on long-horizon tasks. The non-obvious gotcha: a model upgrade is not free. A skill tuned on Sonnet 4.6 can degrade on Opus - not a bug, but because more capable models interpret instructions rather than follow them literally. “Short sentences” applied with judgment on Sonnet; on Opus it became a hard constraint producing choppy, unreadable prose. The fix is a small golden set of test prompts you rerun on every model bump. What’s your current SKILL.md structure - monolithic or spine-and-references? submitted by /u/jimmytoan
Originally posted by u/jimmytoan on r/ClaudeCode
