Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Paper: https://research.trychroma.com/context-rot Abstract: Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks. In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. Authors: Kelly Hong, Anton Troynikov, Jeff Huber Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Related Articles

Why Degrees Don’t Make Developers

When you write your tests TOO LATE... #softwareengineering

"Hello police? I'd like to report a journalism."

Traditional X-Mas Stream

I Tested Dozens of Python Libraries But These 9 Are Actually Worth Using

Related Articles

Article
Why Degrees Don’t Make Developers
Continuously Delivered • 2w ago

Article
When you write your tests TOO LATE... #softwareengineering
Continuously Delivered • 3w ago

Article
"Hello police? I'd like to report a journalism."
Benn Jordan • 1mo ago

Article
Traditional X-Mas Stream
Yannic Kilcher • 1mo ago

News
I Tested Dozens of Python Libraries But These 9 Are Actually Worth Using
Medium Programming • 29m ago