OpenAI just dropped the GPT-5.6 Sol preview. I grabbed the TerminalBench 2.1 chart because the numbers looked off. On the coding benchmark, Sol Ultra is at 91.9% and base Sol is 88.8%. Claude Mythos 5 is next at 88.0%, then GPT-5.5 at 83.4%. The gap between Sol and GPT-5.5 stood out. That is not a normal point release gap. The preview also claims stronger reasoning in science and cybersecurity. I have no way to check the safety stack claims myself. But OpenAI calling out safety upfront instead of hiding it in a system card feels like a shift. Probably because the politics around model releases is hotter now. What I actually care about is whether this shows up in real coding. Benchmarks reward one specific kind of correct completion. My daily work is messier. Half-finished repos, vague tickets, tests that fail for legacy reasons no one remembers. GPT-5.5 was already decent at guessing intent on those. If Sol is meaningfully better at the long-horizon stuff, like planning a multi-file change and predicting which tests will break, that is where the extra points matter. One thing I am less excited about: the usual hype cycle is already flooding the sub. Sol Ultra at 91.9% does not mean every task gets 91% solved. It means Sol Ultra solved 91% of a specific coding benchmark. Keep the hype in check. Has anyone here actually tried Sol or Sol Ultra? Curious if the real gap feels as big as the chart suggests. submitted by /u/Dense-Sir-6707
Originally posted by u/Dense-Sir-6707 on r/ArtificialInteligence
