this is probably the best article i’ve read on what 1M context windows actually change in practice. the biggest takeaway for me: don’t just dump everything in. filtering first (RAG, embeddings, whatever) then loading what’s relevant into the full window beats naive context-stuffing every time. irrelevant tokens actually make the model dumber, not just slower. some other things that stood out:
- performance degrades measurably past ~500K tokens even on opus 4.6
- models struggle with info placed in the middle of long contexts (“lost in the middle” effect)
- a single 1M-token prompt to opus costs ~$5 in API, adds up fast
- claude opus 4.6 holds up way better at 1M than GPT-5.4 or gemini on entity tracking benchmarks seriously bookmarking this one: https://leetllm.com/blog/million-token-context-windows submitted by /u/Sea_Pitch_7830
Originally posted by u/Sea_Pitch_7830 on r/ClaudeCode
You must log in or # to comment.
