Phillip Rhodes' Weblog
Response to the Apple "The Illusion of Thinking" paper
OK, I'm a little late to the party here - had a lot of "life stuff" going on the past few weeks - but I wanted to take some time to write up my thoughts on that (in)famous Apple "The Illusion of Thinking" paper. But before I get into the weeds with this, let me observe that what made the paper so "(in)famous" is as much the reaction to the paper from the various pundits and peanut-gallery of observers, as the actual contents of the paper.
Having said that, let's start with one of the more notable aspects of the paper, and one which might be the target of my my pointed criticism, as well as the source of a lot of the, erm... "odd" narratives about this paper. And that being the title. The full title, of course being: "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity". But most people just shorten it to "The Illusion of Thinking". And as soon as you see that, you have the foundation for the narrative "Apple have shown that AI's don't think." And so the pundits were off to the races as soon as this paper dropped, especially the "AI critics" crowd. And why not? Here, purportedly, is a research paper from a well known and respectable organization, which seemingly justifies everything they have been saying! "AI's don't think." It's a simple, succinct, and easy to understand position. Positively meme-worthy one might say.
The thing is... "AI's don't think" (or the alternate formulation "AI's don't reason") is not what this paper says. Not at all. And so if I'm going to criticize the team who published this paper, it might be for publishing it with a "click-baity" title that was all but guaranteed to generate controversy and reactions written by people hoping to further a particular narrative.
Aside 1: Most of the reactions to this paper seem to have chosen to conflate "LLM" and "AI" as though the terms are synonymous. They are not. LLM's represent just one of many approaches to AI, and observations made about LLM's cannot necessarily be generalized to "AI" at large. Throughout this post if/when I make references to statements like "AI can't think" or what-have-you, know that there is an invisible footnote saying "by AI what we really mean here is LLM". I'll just state this once in this aside to avoid having to repeat this over and over again, or having to choose to artificially replace every use of the term "AI" with "LLM".
The reality of the paper is less shocking, but still interesting nonetheless. So let's start with this blurb from the abstract:
Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood.Right out of the gate we see the researchers acknowledge that "these models demonstrate improved performance on reasoning benchmarks" and point out that their characteristics are not fully understood. This is not the same as a statement that "AI's don't think" at all! They then go on to say "This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”.. Here we see the paper explicitly acknowledging that the AI's under study do some sort of "thinking".
What they go on to do, is examine the details of how these AI's "think" and make some very interesting observations about the characteristics of that thinking. Their research mostly focuses on the question of how well LRM's "think" in regards to problems at different levels of complexity. They pursue this goal by using LRM's to solve a variety of logic puzzles that can be scaled in complexity by adding more elements or other straightforward modifications. As part of the research they examined both the final answers, as well as the intermediate "reasoning traces" produced by the models. At the end of it all, what they found were three specific interesting characteristics of the models:
- The models accuracy appears to bucket into specific "regimes" in regards to complexity. That is, there are clearly identifiable regions of "complexity space" (eg, "low complexity", "medium complexity", "high complexity") where the behavior of the models are markedly distinct.
- The models display an unexpected scaling limit with regards to reasoning effort. Up to a point, the models employ more effort in order to solve more complex problems, but after that point the effort declines even as complexity increases. And this occurs even when the model has plenty of remaining budget for token usage.
- The models see near complete collapse in accuracy beyond a specific complexity threshold.
All of these observations are interesting, and all could serve as a useful foundation to motivate additional future research. And the basic test setup created by this group of researchers seems like something that could be re-used and/or extended to support additional future research as well. But in regards to the overall question of "do models really think", they don't really provide an answer. Rather, what this amounts to in the end seems to be something approximately like "LRM's don't develop a (completely) generalized reasoning mechanism". Which, if you think about it, should be pretty obvious. I don't think anybody really expected LRM's to develop a completely general reasoning mechanism in the first place.
That observation though, leads us to a few other interesting questions, which will be the focus of future posts. Things like:
- What does it even mean for an AI to "think"?
- Does it matter if AI systems think the say way as humans? That is, do we care greatly about mechanism, or are we mainly concerned with outcomes?
- How do other notions like sapience, sentience, self-awareness, and consciousness factor into this whole discussion? Do AI's need to be conscious? And so on...
Posted at 05:24PM Jul 20, 2025 by Phillip Rhodes in Artificial Intelligence | Comments [0]