Finally Proof: the Limits to AI Are Not What We Thought They Were

I recently came across a debate article where a group of AI researchers argue that large language models aren’t as smart as they appear to be.

Computational linguist Emily M. Bender made a similar argument in her much cited “Stochastic parrot” paper from 2021, in which the she claimed that just because LLM’s seem convincing—even to the point that they pass the Turing test—doesn’t mean they can think.

Is it possible to prove that LLM’s are more than fancy auto-complete algorithms? Prove that they can exceed the limitations of the data they’ve been trained on?

Well, it’s been complicated, since only the companies developing the largest models have access to test- and training data.

Which means the conversation has pretty much been reduced to a battle of opinions.

That might be changing now.

Quanta Magazine just published an article about how Princeton mathematician Manjeev Arora and Anirudh Goyal, a grad student at University of Montreal (where he’s supervised by Joshua Bengio) have managed to mathematically prove that:

As these models get bigger and are trained on more data, they improve on individual language-related abilities and also develop new ones by combining skills in a manner that hints at understanding — combinations that were unlikely to exist in the training data.
New Theory Suggests Chatbots Can Understand Text | Quanta Magazine, 24.01.22

The proof is based on a combination of random graph theory, and the the neural scaling law, which states that as LLM’s get bigger, their loss on test data—the difference between predicted and correct answers on new texts, after training—decreases in a very specific manner. A manner that can be modelled in a bipartite graph. A graph that turns out to carry important information about how the model gains skills.

And the real beauty: the technique works without access to test- and training data.

This mathematical black magic means we’ve now got proof that the largest LLM’s don’t just mimic what’s in their training data. That they “must be doing generalization“, to quote Sébastien Bubeck.

He’s vice president of generative AI at Microsoft, and was not part of Arora & Goyal’s study.

In a funny kind of way, a study like this seems to produce knowledge which extends our ignorance.

We used to be able to assume that chatbots were nothing but dumb parrots, that language is nothing but “an interface to intelligence“. Now that we’ve proven ourselves wrong, I guess we need to retreat to higher grounds in order to keep up the illusion of our species’ cognitive superiority.