Phillip Rhodes' Weblog

Sunday July 20, 2025

Response to the Apple "The Illusion of Thinking" paper

OK, I'm a little late to the party here - had a lot of "life stuff" going on the past few weeks - but I wanted to take some time to write up my thoughts on that (in)famous Apple "The Illusion of Thinking" paper. But before I get into the weeds with this, let me observe that what made the paper so "(in)famous" is as much the reaction to the paper from the various pundits and peanut-gallery of observers, as the actual contents of the paper.

Having said that, let's start with one of the more notable aspects of the paper, and one which might be the target of my my pointed criticism, as well as the source of a lot of the, erm... "odd" narratives about this paper. And that being the title. The full title, of course being: "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity". But most people just shorten it to "The Illusion of Thinking". And as soon as you see that, you have the foundation for the narrative "Apple have shown that AI's don't think." And so the pundits were off to the races as soon as this paper dropped, especially the "AI critics" crowd. And why not? Here, purportedly, is a research paper from a well known and respectable organization, which seemingly justifies everything they have been saying! "AI's don't think." It's a simple, succinct, and easy to understand position. Positively meme-worthy one might say.

The thing is... "AI's don't think" (or the alternate formulation "AI's don't reason") is not what this paper says. Not at all. And so if I'm going to criticize the team who published this paper, it might be for publishing it with a "click-baity" title that was all but guaranteed to generate controversy and reactions written by people hoping to further a particular narrative.

Aside 1: Most of the reactions to this paper seem to have chosen to conflate "LLM" and "AI" as though the terms are synonymous. They are not. LLM's represent just one of many approaches to AI, and observations made about LLM's cannot necessarily be generalized to "AI" at large. Throughout this post if/when I make references to statements like "AI can't think" or what-have-you, know that there is an invisible footnote saying "by AI what we really mean here is LLM". I'll just state this once in this aside to avoid having to repeat this over and over again, or having to choose to artificially replace every use of the term "AI" with "LLM".

The reality of the paper is less shocking, but still interesting nonetheless. So let's start with this blurb from the abstract:

Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood.

Right out of the gate we see the researchers acknowledge that "these models demonstrate improved performance on reasoning benchmarks" and point out that their characteristics are not fully understood. This is not the same as a statement that "AI's don't think" at all! They then go on to say "This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”.. Here we see the paper explicitly acknowledging that the AI's under study do some sort of "thinking".

What they go on to do, is examine the details of how these AI's "think" and make some very interesting observations about the characteristics of that thinking. Their research mostly focuses on the question of how well LRM's "think" in regards to problems at different levels of complexity. They pursue this goal by using LRM's to solve a variety of logic puzzles that can be scaled in complexity by adding more elements or other straightforward modifications. As part of the research they examined both the final answers, as well as the intermediate "reasoning traces" produced by the models. At the end of it all, what they found were three specific interesting characteristics of the models:

The models accuracy appears to bucket into specific "regimes" in regards to complexity. That is, there are clearly identifiable regions of "complexity space" (eg, "low complexity", "medium complexity", "high complexity") where the behavior of the models are markedly distinct.
The models display an unexpected scaling limit with regards to reasoning effort. Up to a point, the models employ more effort in order to solve more complex problems, but after that point the effort declines even as complexity increases. And this occurs even when the model has plenty of remaining budget for token usage.
The models see near complete collapse in accuracy beyond a specific complexity threshold.

All of these observations are interesting, and all could serve as a useful foundation to motivate additional future research. And the basic test setup created by this group of researchers seems like something that could be re-used and/or extended to support additional future research as well. But in regards to the overall question of "do models really think", they don't really provide an answer. Rather, what this amounts to in the end seems to be something approximately like "LRM's don't develop a (completely) generalized reasoning mechanism". Which, if you think about it, should be pretty obvious. I don't think anybody really expected LRM's to develop a completely general reasoning mechanism in the first place.

That observation though, leads us to a few other interesting questions, which will be the focus of future posts. Things like:

What does it even mean for an AI to "think"?
Does it matter if AI systems think the say way as humans? That is, do we care greatly about mechanism, or are we mainly concerned with outcomes?
How do other notions like sapience, sentience, self-awareness, and consciousness factor into this whole discussion? Do AI's need to be conscious? And so on...

I think it goes without saying that we don't necessarily have complete answers to all of those questions yet. But that doesn't mean that there isn't anything we can say on these matters. Please subscribe to our RSS feed and look for future posts that dive into this discussion, and much more!

Posted at 05:24PM Jul 20, 2025 by Phillip Rhodes in Artificial Intelligence | Comments [0]

Wednesday May 14, 2025

AI is not magic

I have a T-shirt I wear sometimes, that reads "AI is like magic... but real." It's a nice bit of quirky, harmless fun. I wear it because it amuses me, and as a conversation starter. But it's a joke. AI is NOT magic.

Why is this an important observation? Simply because too many people get caught up in the hype, and start to think that they can just cast the magic incantation "Use AI" and all of their problems will be solved. But it does not work that way. Building systems using AI is still hard work and still requires (depending on exactly what you're building) a lot of knowledge, expertise, and engineering talent. Not to mention a healthy dose of patience.

Even in this modern world of LLM's, to build a system that does something moderately complicated can require very complex code, and intricate engineering tradeoffs that trade between, say, token usage, and answer quality. Latency is another factor that can really ruin your day if you're not careful.

When you're knee deep in building one of those things, you may find yourself having to choose between a prompting strategy like "Chain of Thought" or "Tree of Thought" or "Skeleton of Thought" or "Chain of Feedback" or "Chain of Draft" or... anyway, you get the idea. And it's not just choosing the strategy, it's everything else implied that that choice. For example, using the Skeleton of Thought pattern implies writing additional code to process the "skeletal" answers and then make additional subsequent LLM calls (possibly in parallel) and to assemble the final answer. This isn't something you just knock out in five minutes without serious consideration.

And the same basic premise applies to so many aspects of AI... or really, to digital technology in general. In fact, this whole essay could probably have been written 15 years ago with "Message Queues are not magic" as the title, and 20 years ago with "XML is not magic" as the title, and so on.

If anything, this is a plea to stop, take your time, look beneath the surface, and understand - and truly internalize - the idea that "the Devil is in the details."

Posted at 06:31PM May 14, 2025 by Phillip Rhodes in Artificial Intelligence | Comments [0]

Sun	Mon	Tue	Wed	Thu	Fri	Sat
« July 2025
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Today

Phillip Rhodes' Weblog

Calendar

Feeds

Search

Links

Navigation