Original Reddit post

TLDR: I am saving tons of tokens with Haiku non-thinking and pal-mcp (OpenRouter) while using BMAD and Beads. Put $10 on OpenRouter to get high token access to all free models. Restrict available models in pal-mcp .env to only the free models. Tell Claude to “use pal…” at the beginning of your prompts. Voila. I’m fairly new to Claude Code, currently only using the Pro sub as I’m a hobbyist and don’t do any of this commercially/for a living. As I’ve fumbled my way around Claude Code I’ve found (like many users) I’m hitting limits constantly, and even used up my weekly limit 4 days in last week. I’ve done a deep dive trying to figure out how to save tokens and make Claude more efficient, at least in the stage I’m in with brainstorming using BMAD. Basically, I’ve chucked $10 at openrouter and use pal-mcp to access all of the free models available there. Before pal-mcp, even when all of my brainstorming was done with haiku non-thinking, I would blow through my tokens like crazy and get maybe two sessions done (with /clear used correctly and Beads installed) before I’d reach the limit for my five hour window (maybe an hour’s worth if I’m lucky). Now, I’ve been brainstorming in bmad for three hours straight and am at only 5.8% of my token usage for this five hour window (though I am almost at 50% message usage). I just load the bmad brainstormer, choose the technique, and tell it to do the initial pass with pal. Claude finds the most appropriate free model (I restricted the available models to only free models in the .env) and does the initial brainstorm with it, then returns its summary for me to answer questions about, adjust, challenge, and clarify. As someone who is technical enough to use the cli, but completely lost as to how I could get more value out of my tokens as a heavy user, this minimalistic setup is perfect for me. I have searched and searched this subreddit to find answers for implementations for token-efficiency that are simple enough for a hobbyist amateur solo dev/designer (I would not call myself a developer), so I felt compelled to put this out there in case anyone is in the same boat. That being said, I’ve not tested this on codebases, as I’ve only begun the brainstorming process for my application, but I’m optimistic. Maybe someone else has used this setup for varying codebase sizes and could give some feedback? Is there anything else you would add to this? submitted by /u/savvylr

Originally posted by u/savvylr on r/ClaudeCode