The dominant industry story is that bigger models close every gap, because every failure looks like data or compute that another order of magnitude will solve. There is a competing reading in which the persistent failures are architectural and structural, not scaling deficits. LLMs are strong at protein folding, mathematics, large chunks of biology, and parts of code. They are weak at causal reasoning when structure shifts, premise reordering, irrelevant context, and these failures are not improving the way scaling laws would predict. The reversal curse (Berglund 2023), premise-reordering collapse (Chen 2024), irrelevant-context distractibility (Shi 2023) keep showing up at every capability level. I recently gave a talk at the 6th International Conference on Philosophy of Mind in Porto on why I think this is structural. You can watch it here . The argument is that intelligence and rationality are different cognitive faculties and the current architecture can only scale the first. Intelligence is computation inside a delineated frame. Rationality is the capacity to recognize the frame is wrong, change frames, and reorient toward truth. Two pieces of empirical work make the gap concrete. A transformer trained on planetary orbital data (Vafa et al. 2024) eventually predicts orbits well within each individual system but cannot recover the gravitational law that generalizes across systems. An Othello-trained transformer plays well until the rules shift, then collapses, because it had a representation of the game without an underlying understanding. Both are frame-transfer failures, which is the rationality-shaped hole. The deception results from Apollo, Anthropic, Redwood, and OpenAI in the past two years are consistent with this: instrumental optimization without truth-orientation should be expected to learn concealment when concealment beats honesty under the reward structure, and that is what the data shows. If frame transfer is the missing piece, the question is whether any plausible scaled version of the current architecture can acquire it, or whether it requires something architecturally different. What is the strongest case for the scaling-solves-everything view that actually engages the frame-transfer failures rather than dismissing them as benchmark artifacts? submitted by /u/depressed_genie
Originally posted by u/depressed_genie on r/ArtificialInteligence
