Phillip Rhodes' Weblog

Sunday July 20, 2025

Response to the Apple "The Illusion of Thinking" paper

OK, I'm a little late to the party here - had a lot of "life stuff" going on the past few weeks - but I wanted to take some time to write up my thoughts on that (in)famous Apple "The Illusion of Thinking" paper. But before I get into the weeds with this, let me observe that what made the paper so "(in)famous" is as much the reaction to the paper from the various pundits and peanut-gallery of observers, as the actual contents of the paper.

Having said that, let's start with one of the more notable aspects of the paper, and one which might be the target of my my pointed criticism, as well as the source of a lot of the, erm... "odd" narratives about this paper. And that being the title. The full title, of course being: "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity". But most people just shorten it to "The Illusion of Thinking". And as soon as you see that, you have the foundation for the narrative "Apple have shown that AI's don't think." And so the pundits were off to the races as soon as this paper dropped, especially the "AI critics" crowd. And why not? Here, purportedly, is a research paper from a well known and respectable organization, which seemingly justifies everything they have been saying! "AI's don't think." It's a simple, succinct, and easy to understand position. Positively meme-worthy one might say.

The thing is... "AI's don't think" (or the alternate formulation "AI's don't reason") is not what this paper says. Not at all. And so if I'm going to criticize the team who published this paper, it might be for publishing it with a "click-baity" title that was all but guaranteed to generate controversy and reactions written by people hoping to further a particular narrative.

Aside 1: Most of the reactions to this paper seem to have chosen to conflate "LLM" and "AI" as though the terms are synonymous. They are not. LLM's represent just one of many approaches to AI, and observations made about LLM's cannot necessarily be generalized to "AI" at large. Throughout this post if/when I make references to statements like "AI can't think" or what-have-you, know that there is an invisible footnote saying "by AI what we really mean here is LLM". I'll just state this once in this aside to avoid having to repeat this over and over again, or having to choose to artificially replace every use of the term "AI" with "LLM".

The reality of the paper is less shocking, but still interesting nonetheless. So let's start with this blurb from the abstract:

Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood.
Right out of the gate we see the researchers acknowledge that "these models demonstrate improved performance on reasoning benchmarks" and point out that their characteristics are not fully understood. This is not the same as a statement that "AI's don't think" at all! They then go on to say "This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”.. Here we see the paper explicitly acknowledging that the AI's under study do some sort of "thinking".

What they go on to do, is examine the details of how these AI's "think" and make some very interesting observations about the characteristics of that thinking. Their research mostly focuses on the question of how well LRM's "think" in regards to problems at different levels of complexity. They pursue this goal by using LRM's to solve a variety of logic puzzles that can be scaled in complexity by adding more elements or other straightforward modifications. As part of the research they examined both the final answers, as well as the intermediate "reasoning traces" produced by the models. At the end of it all, what they found were three specific interesting characteristics of the models:

  1. The models accuracy appears to bucket into specific "regimes" in regards to complexity. That is, there are clearly identifiable regions of "complexity space" (eg, "low complexity", "medium complexity", "high complexity") where the behavior of the models are markedly distinct.
  2. The models display an unexpected scaling limit with regards to reasoning effort. Up to a point, the models employ more effort in order to solve more complex problems, but after that point the effort declines even as complexity increases. And this occurs even when the model has plenty of remaining budget for token usage.
  3. The models see near complete collapse in accuracy beyond a specific complexity threshold.

All of these observations are interesting, and all could serve as a useful foundation to motivate additional future research. And the basic test setup created by this group of researchers seems like something that could be re-used and/or extended to support additional future research as well. But in regards to the overall question of "do models really think", they don't really provide an answer. Rather, what this amounts to in the end seems to be something approximately like "LRM's don't develop a (completely) generalized reasoning mechanism". Which, if you think about it, should be pretty obvious. I don't think anybody really expected LRM's to develop a completely general reasoning mechanism in the first place.

That observation though, leads us to a few other interesting questions, which will be the focus of future posts. Things like:

  • What does it even mean for an AI to "think"?
  • Does it matter if AI systems think the say way as humans? That is, do we care greatly about mechanism, or are we mainly concerned with outcomes?
  • How do other notions like sapience, sentience, self-awareness, and consciousness factor into this whole discussion? Do AI's need to be conscious? And so on...
I think it goes without saying that we don't necessarily have complete answers to all of those questions yet. But that doesn't mean that there isn't anything we can say on these matters. Please subscribe to our RSS feed and look for future posts that dive into this discussion, and much more!

Wednesday May 14, 2025

AI is not magic

I have a T-shirt I wear sometimes, that reads "AI is like magic... but real." It's a nice bit of quirky, harmless fun. I wear it because it amuses me, and as a conversation starter. But it's a joke. AI is NOT magic.

Why is this an important observation? Simply because too many people get caught up in the hype, and start to think that they can just cast the magic incantation "Use AI" and all of their problems will be solved. But it does not work that way. Building systems using AI is still hard work and still requires (depending on exactly what you're building) a lot of knowledge, expertise, and engineering talent. Not to mention a healthy dose of patience.

Even in this modern world of LLM's, to build a system that does something moderately complicated can require very complex code, and intricate engineering tradeoffs that trade between, say, token usage, and answer quality. Latency is another factor that can really ruin your day if you're not careful.

When you're knee deep in building one of those things, you may find yourself having to choose between a prompting strategy like "Chain of Thought" or "Tree of Thought" or "Skeleton of Thought" or "Chain of Feedback" or "Chain of Draft" or... anyway, you get the idea. And it's not just choosing the strategy, it's everything else implied that that choice. For example, using the Skeleton of Thought pattern implies writing additional code to process the "skeletal" answers and then make additional subsequent LLM calls (possibly in parallel) and to assemble the final answer. This isn't something you just knock out in five minutes without serious consideration.

And the same basic premise applies to so many aspects of AI... or really, to digital technology in general. In fact, this whole essay could probably have been written 15 years ago with "Message Queues are not magic" as the title, and 20 years ago with "XML is not magic" as the title, and so on.

If anything, this is a plea to stop, take your time, look beneath the surface, and understand - and truly internalize - the idea that "the Devil is in the details."

Tuesday April 29, 2025

This blog got ransomwared

I'm generally a big believer in "learning in public" and emphasizing transparency, for numerous reasons. I won't detail those reasons now, but that might make a good blog post in its own right. But for now, I'm just going to share something that happened to me in the process of setting up this blog, where I made a series of mistakes that led to the initial install of my blog getting ransomwared!

Before getting into the nitty gritty, let me just say that the whole ransomware thing was a nothing-burger. The only content on the blog was a "Hello World" post I made basically to test that the Roller install was up and running. Let me also say early on that the attack that happened was successful due solely to my carelessness and is in no way an indictment of Roller, or Tomcat, or Postgresql or any of the other parts of my stack.

Anyway, the story. So I finished writing all the Ansible roles and playbooks, and bash scripts, that I need to deploy this server on Sunday evening. I ran the script, saw the blog server come up successfully and wrote the aforementioned "Hello World" post. It was late, I was tired, and so I opened a text file in the project directory, wrote some notes to myself as far as "punch list" items to finish later and went off to do other things.

The next morning I happened to mention this new blog in a comment on a Hacker News thread (one of those "What are you working on?" threads that pop up from time to time). As a sanity check, I clicked the link I posted... and it didn't work. Huh?

After spinning for a while, I finally got an error message saying "database rollerdb does not exist." I was briefly bewildered since I knew the db was working fine about 12 hours earlier and I knew I had not touched anything since then. A creeping suspicion started to crawl into my consciousness.

I ssh'd into the server, verified that Postgresql was still running, and then fired up psql and listed the databases. And lo and behold, there was no rollerdb database present. Instead there was a database named "readme_to_recover". That nagging suspicion started to become a blaring alarm. But I held out some small hope that this was just a Postgresql failure and that creating a database with that name was part of some error handling routine.

Plowing on, I connected to that database, listed tables and saw one name "readme". Doing a quick `select * from readme` I was greeted with a message approximately like

Your content has been encrypted. To recover, send 0.13 BTC to the following address ...

I'd been ransomwared. First time this has ever happened to me, so I was in a bit of disbelief for a few minutes. And this server had been up for less than 24 hours even!

My thoughts quickly turned to "how did they get in?" and I started investigating the state of my server. Now, I had a hunch pretty early on, and once I'd confirmed the ransomware situation, I basically jumped straight to trying to confirm or deny that hunch. And it went something like this:
"I bet I had Postgreql bound to all IP interfaces on this host by accident, and probably had the postgres user set for "trust" authentication".
Of course that still wouldn't have explained why the firewall didn't block access on port 5432, but it seemed like a good place to start. So I jumped over the Postgresql config dir and checked my postgresql.conf file and found this:

listen_addresses = '*'

At that point the rest was pretty much a foregone conclusion. I moved on to checking the pg_hba.conf and found this line:

host all postgres 0.0.0.0/0 trust

OK, not much question remaining now (aside from "How could I be so stupid", but we'll get to that in a minute) and "what about the firewall?"

I did a quick firewall-cmd --list-ports and was informed that the firewalld service was not running. And there ya go. I tried a "telnet philliprhodes.name 5432" from my laptop and was greeted with a connection banner. Anybody in the entire world could connect to my database, as the admin user, with no authentication. Not my proudest moment, but I kinda knew why this happened, and more to the point, I knew (at least some of) the steps I needed to take to make sure it didn't happen again.

First things first, I edited the postgresql.conf to change it to only bind to 127.0.0.1, and changed the pg_hba.conf to only authenticate tcp connections for user "postgres" from 127.0.0.1 as well. I didn't immediately change the authentication from "trust" to something else only because that would require changes on the application side as well and I just wanted to get things back up and running in a somewhat more secure fashion.

I redeployed everything and then did a server reboot (because I wanted to see if the firewalld service was starting correctly at bootup). After a reboot firewalld was indeed running as expected and port 5432 was no longer open.

So... what exactly happened here? As somebody who has been doing this stuff for 25+ years and who thinks of himself as slightly competent, how did I manage to make such an egregious series of mistakes? And what am I doing in response? Glad you asked.

For starters, I basically know how I got there. I have a habit of sitting in random coffee shops / cafes / etc. and ssh'ing into my server(s) to work on stuff. And at some point, I wanted to do something on one of my servers using psql directly over IP. So I created a postgresql.conf file with that bind configuration, and a pg_hba.conf with that authentication configuration, and deployed those files. And I made everything so generically open so I wouldn't have to worry about knowing the IP address I was on, or dealing with it changing, etc. That's just laziness on my part sadly.

That all almost certainly got reverted on the server later (I say that because none of my other servers have had similar problems, but we'll come back to that later as well). Anyway, those files apparently made it into a directory of sample files I keep around on my laptop to crib from when setting up Postgreql. And I copied those blindly when creating my Ansible role for this. The firewall thing? Something similar I'm sure. It was correctly set to start on boot, so I'm pretty sure I shelled in one day to do something, stopped the service for some temporary manual fiddling, and then just plain forgot to restart it. And the server had not been rebooted since then.

So in the end, I somehow managed to make three ridiculous mistakes, where any two of them alone would probably have NOT resulted in my server getting ransomwared.

So what now? Well, the new server has been running fine for 24+ hours now, so I'm pretty sure whatever script-ransomware-kiddie port scanner that found my vulnerable server is no longer messing with me. But I don't want this to happen again, so what to do? Well, there's probably a million things and I probably don't know all of them, but here are some things I have already done, or plan to do:

  1. Quit doing so much "manual fiddling" on servers in the first place. That's part of the reason I'm going down the Ansible path to begin with. I want all my servers to be completely deployable through "known good" (more or less) automation processes. The goal is to get to where any server I maintain can be completely rebuilt by re-provisioning the underlying VM and then running the associated script that triggers Ansible.
  2. Setup automated backups where the backup data is pulled by another server that is well hardened and is never even mentioned on the server that is being backed up
  3. Add an nmap port-scan to the end of my bash scripts that do server configuration. So every. single. time. I fire the script against a server, the last thing that happens is nmap scans for open ports and dumps that right in my face. I've already implemented this, and had I had this earlier I would have noticed the problem immediately.
  4. Quit using the postgres user for applications.This is a really bad habit of mine and one I need to break. Likewise, I'm going to quit relying on "trust" authentication and start always mandating a password (even for connections from localhost).
  5. Set up routine automated port-scanning for all of my servers. Like I said, none of the others have ever had something like this happen, but I can't assume that everything is perfect on those either. So at some point I'm setting up a scheduled job to run at least a port scan, if not some deeper automated security checks and point it at every server I maintain.
  6. Until (5) is done, I'm going to do a manual audit of all of my other servers for any obvious fuck-ups like this, and fix any such problems I find.
  7. Start running Postgresql on a non-standard port. I know, I know... "security through obscurity". And yet... I strongly suspect I got tagged by an automated scanner just crawling around looking for open ports like 5432 and 3306 and suchlike. There's a decent chance that if I'd been running Postgresql on 5999 or 7379 or something, this wouldn't have happened.

Anyway that's the story of how my shiny new blog got hit with a ransomware attack. Sadly (for the attackers) I won't be paying the ransom. And hopefully the lessons I learned from all of this mean I won't have to in the future either.

Monday April 28, 2025

Hello, Blogosphere!

If you're reading this, congratulations! You've made your way to my new weblog (powered by Apache Roller!). More to come...

Calendar

Feeds

Search

Links

Navigation