Original Reddit post

I never really thought that the reason my AI gets dumber and forgets more in longer sessions was because of me, I thought it was just like that, and that I can’t do much about it. But it kept annoying me, as an 18 y.o. solo dev who pays for AI myself, I hate hitting my usage limits mid-session and cutting my flow having to wait hours before I can continue. So I started thinking, how can I fix this? what part of what I am doing might be causing this? that’s when I looked at my prompts, they were messy, unstructured, filled with typos, and I thought could they be hurting me more than I believe they do? That’s when I started researching the possible negative effects my bloated prompts are having on my sessions, here is what I found: Prompt decay: What it is: I am sure some of you have experienced this, where the longer your session goes, the more AI starts cutting corners. Constraints and instructions you set early fade, answers get vague, and you start sending correction messages just to get it back on track. Why your prompt causes it: LLMs have a limited context window, they can only remember so much, so when you keep sending bloated prompts, your actual instructions start getting buried in filler. The model ends up having to work harder to find what you want through the noise, and in a long session, the signal fades faster than it should. Output quality mirrors input: The AI reflects the structure and tone of what you give it, rambling input produces rambling output. The model might give you the desired output, but it ends up costing you in more ways than one. Tight input forces tighter, more precise responses. You can try this out for yourself. You’re paying for noise: Every token counts, whether it’s against your usage limits in the app, or against your bill on the API. And it’s not just input tokens, output tokens are always more than input tokens so if your bloated prompt is producing bloated responses, you’re getting hit on both ends. Fix the input and you cut both. You probably already guessed it, all these trace back to one thing: Input Tokens. Every extra word in your prompt is a token. Tokens that bury your instructions, tokens that shape a bloated response, tokens you’re billed for on both ends. The fix is simpler than it sounds; Write less. Be deliberate. Cut the filler, say what the AI needs, never more, never less. Get into the habit of doing that and I promise you will feel the change. But most people, including myself, don’t write prompts that way, especially mid-session when you’re thinking out loud and just need an answer fast. I mean, having to be careful of every word I say in my prompts is quite tiring. That’s where I thought about building a tool to do just that, it’s called Squaizer, it does the stripping automatically in 2 hotkeys only, select your prompt with CTRL+A (or using your cursor), and press the Fast Squeeze hotkey to replace your selected prompts in place. I built it using Claude Code, which is a bit ironic — using Claude to build a tool that makes you use your AI better. It keeps your constraints and signal intact, just strips the noise. The result is a tighter prompt that decays slower, produces cleaner output, and costs you fewer tokens both input and output wise. It also shows you the before and after token count when squeezing so you can see exactly how much you are saving. There’s a free demo, no signup needed on the homepage www.squaizer.com . Would love to know if I missed anything in the three points I mentioned above, and if you would use such a tool. Happy to answer any questions. submitted by /u/a7zxd_27

Originally posted by u/a7zxd_27 on r/ArtificialInteligence