Does GPT "understand"? Intuition, Tokens, and the Meaning Debate - part 1

Oliver Jack Dean Dec 12

Researchers are actively investigating techniques to understand why
LLMs say what they say and how we might better control them. Such an area of research is called AI Explainability.

It's interesting for a number of reasons. But most importantly, the whole foundation of AI Explainability is set upon the idea that "AI does not just predict the next word" - AI displays human like cognition.

I have often said the opposite outloud to newbies making their first entry into the world of AI - "LLMs are just huge models that predict the next word". It's like a slot machine. Every time you pull the lever - neural nets (NN) "find the order" of words. Bingo. Prediction machines. This helps demystify AI in 5-minutes and creates less scaremongering.

But in hindsight this might be a bit steep.

Let me unpack this some more because this idea actually walks across the fault line between engineering, philosophy of mind, and cognitive science. The idea that LLMs are just "prediction machines" is smuggling in a lot of sneaky metaphysics.

Beyond Next-Token Prediction

The slogan “AI does not just predict the next word” captures something importantly right and dangerously misleading about LLMs like GPT.

It's what Hinton, Le Cunn and other Godfathers have been discussing for some time. Interestingly, different Godfather's really do straddle across a wide spectrum of intellectual interpretation of how LLMs actually behave.

On the one hand, it is trivially true that these models are next‑token predictors. The training objective is brutally simple: predict "something" with as minmal prediction error as possible over massive data corpora.

But here is the flip: there's no explicit symbol calculus for "meaning", no hand‑engineered ontology, no built‑in logic engine in these LLMs - well NNs.

And yet, in practice, GPT‑4 and now 5.x can solve riddles like Hinton's "room painting puzzle" or generate creative analogies such as comparing
an human's immune system to a cybersecurity platform. This certainly is some sort of "intuitive pattern recognition" operating at a high level of abstraction.

Such models appear to display such phenomona.

In more technical phrasing, to make this work - GPT has to build a compressed internal model of how the world and language co‑vary. When it explains the human immune system-to-cybersecurity analogy, it is not calling upon pre‑written explanations; it is recombining distributed representations that capture causal, physical, and social regularities encoded in its weights.

Philosophers will call behaviour "intuition derived from massive data" as it usefully emphasizes that the model's competence - it's not using a stack of explicit rules. But on the otherside, when you dig into the engineering and research - this claim overshoots.

Saying "to predict the next word, the system must understand meaning and context" sneaks in a contested philosophical claim under the cover of engineering. There is actuall a gap between functional understanding (using concepts in context to solve tasks) vs grounded understanding (having those concepts tied to sensorimotor experience, goals, and a point of view). GPT‑4 has the former in impressive measure; the latter remains controversial.

Internally, NNs using large transformers exhibit emergent structure.

NN's don't use crisp symbols like a math proof or a logic program does. They don't have a literal DOG = 1, CAT = 2 table or a line of code that says IF A THEN B. Instead, everything is stored as patterns of numbers spread across many neurons.

But when these NNs get very large and are trained on lots of data, something interesting happens: patterns inside them start to look a bit like "soft" versions of symbols and variables.

Approximate variables: The network can keep track of "the subject of this sentence," "the thing we're comparing" or "the current topic" even though there’s no explicit variable named subject. Some directions in the internal vector space behave as if they were variables.

Disentangled features: Different neurons or combinations of neurons respond to different abstract properties (like "past tense", "is an animal", "is polite speech") instead of everything being a total mess. That's what "disentangled" means: the network can separate different aspects of meaning.

Soft symbols: Instead of a single symbol like DOG, you get a fuzzy pattern of activation spread over many neurons. That pattern behaves like a symbol (it can be reused, composed with others, etc). But it's not a clean, one‑bit thing. So we call them "soft" symbols.

So, are they prediction machines?

The charitable version of the claim is: powerful next‑token prediction on internet‑scale data forces the emergence of something functionally similar to intuitive, flexible, concept‑mediated cognition, implemented in a non‑symbolic, high‑dimensional substrate.

That is both remarkable and unsettling because it shows that sophisticated behavior can arise from simple objectives plus scale, without explicit symbolic architectures.

The uncharitable, and wrong, version is: since they solve riddles and anaologies, these systems must "really" understand in the same sense humans do.

Sadly, we are not there yet and it is an open research question whether scaling, grounding, or architectural changes will close that gap.

To sum this all up: AI today is more than "just" predicting the next word. But its emergent intuition is not magic; it is statistics pushed to the point where the distinctions between prediction, abstraction, and understanding become much harder to draw cleanly.

Perhaps, in 2x years - my intuition here might entirely be wrong. Let's see.