“A new paper out of Harvard (Luo, King, Puett, Smith) introduces Recoding-Decoding (RD), a decoding scheme that pulls the long tail of an LLM’s knowledge into actual outputs by injecting priming phrases and diverting tokens during decoding stage. How RD works: The authors argue that modern LLMs encode an enormous slice of human knowledge, but standard decoding (top-k, nucleus, etc.) only ever pulls from the peak of the conditional distribution. The long tail — heterodox, contrarian, non-Western, weird-but-relevant — sits unused. RD diverts the model off its modal path by: 1) Prepending a random ““priming phrase”” (e.g., Related to FOOD: , Related to SKY: ) 2) Injecting a random 3-letter ““diverting stem”” (Pas, Tib, Mon, …) at the start of each new sentence For example, ““Brainstorm a world history topic”” can now resolve to “”[Pas]ta and the silk road”" or “”[Tib]etan sky burials"" by absorbing the injected tokens of [Pas] and [Tib], instead of generating the dominant answer of ““Age of Enlightenment.”” What they found across 50 brainstorm topics + 500 prompts from 5 public datasets that relevance stays around 0.99 but diversity grows almost linearly out to 1,000 runs. They also found that the stronger the LLM (Gemini-3 > GPT-5.1 > GPT-3.5 > DeepSeek-3), the larger RD’s lead — because more capable models have more peaked distributions and thus more hidden tail knowledge. Why it matters: The authors frame this as the ““search quest”” problem — picking a wedding dress, a research topic, a startup name, a school for a kid. The goal isn’t the correct answer; it’s learning the space. Current LLMs are anti-optimized for that, which the paper argues is quietly driving collective homogenization (they cite a striking incident where students using ChatGPT to outline essays turned in nearly identical arguments without ever talking to each other). 📄 Paper: https://arxiv.org/abs/2603.19519 submitted by /u/ResponsibleLeg9220
Originally posted by u/ResponsibleLeg9220 on r/ArtificialInteligence
