Different LLMs were advertised with 128k to 1M+ token context windows, but when using any web interface i seem to get a brutally small context window resulting in amnesia in every important conversation i have! I exported a chat from Gemini (happens with ChatGPT/Claude too!) that was exactly 41k tokens long. I asked it to recall my first message to test memory. The context had aggressively truncated to about the first half of our conversation. I suppose they use a rolling context window to save compute, but I have two questions: What is the actual hard token limit for these Web UIs before they start silently deleting our history? Why don’t we have a simple UI indicator showing how much of our chat is still actually “in memory”? Am I forced to use the API if we want the AI to actually remember a whole session? Maybe you guys can help me out with this, thanks! submitted by /u/Accomplished_Hall561
Originally posted by u/Accomplished_Hall561 on r/ArtificialInteligence
