Today, I was looking for a slash command and saw one I hadn’t noticed before called /claude-md-improver. I ran it and it found several ways to slim down my project’s claude.md file. and, I asked it to review my global file and it found a lot there too. Nothing removed, just fixed redundancies and changed a few things to only load when needed instead of all the time. So, I had the bright idea to say “i just updated my claude.md files and it makes me wonder if there are other ways I should be saving tokens automatically.” Claude came up with ⭐ Trim the skill + MCP registry per project. PreToolUse hook to redirect bulk Reads to ask-gandalf. (Gandalf is my local LLM running Qwen) Pre-compact hook at 70% context. Lower-impact but worth knowing:
- Prompt caching on the gateway’s Anthropic calls (if not already on - check mcp-prompts.ts). 90% discount on cached input tokens.
- OutputStyle set to terse mode for routine sessions; verbose only when explicitly needed. One real gap: nothing logs usage.cache_read_input_tokens / cache_creation_input_tokens. You’re paying for caching but flying blind on hit rate. Adding that to the existing usage logger in usage_log is a 10-min change — want me to do it? I simply asked Claude to implement them all. Just did this right now so not sure of exact reduction in token use but I’ll report back. submitted by /u/benfinklea
Originally posted by u/benfinklea on r/ClaudeCode
