A Brief History of AI: From Dartmouth to DeepMind

I’ve been in software for most of my professional life. “Artificial intelligence” was always a natural part of the landscape, but it failed to really capture my attention. That recently changed.

There’s an enigmatic number floating around in Douglas Adam’s Hitchhiker’s Guide to the Galaxy series. That number is 42 and it took massive efforts to arrive at, but nobody undersands what it’s really supposed to answer.

With AI, I’d claim that the numer is 66 and the question it’s answering is: how many years will it be until we arrive?

66 years ago “the Summer Research Project on Artificial Intelligence” was held at Dartmouth College, New Hampshire. It’s more often referred to as the Dartmouth workshop, and it’s generally considered to be the ignition moment of the field.

(Ahem, what about Alan Turing’s address to the London Mathematical Society on February 20th 1947 – a full nine years before the Dartmouth workshop – where he spoke on the topic of human level machine intelligence? That speech is indeed acknowledged as the first scientific discussion on AI, but Turing was a lone genius before his time, operating far outside of anything like a field.)

The belief in magic returned

So with 1956 firmly established as a starting point, what turns 2022/23 into such an important milestone? The answer is highly subjective.

To make it slightly less so, I’m going to lean on science fiction author Arthur C. Clarke who stated that Any sufficiently advanced technology is indistinguishable from magic.

Or why not quote Virginia Woolf:

Then she got into the lift, for the good reason that the door stood open; and was shot smoothly upwards. The very fabric of life now, she thought as she rose, is magic. In the eighteenth century, we knew how everything was done; but here I rise through the air; I listen to voices in America; I see men flying – but how it’s done I can’t even begin to wonder. So my belief in magic returns.

Woolf wrote Orlando almost a hundred years ago, but wonder she evokes exactly mirrors what I now feel when I’m playing around with ChatGPT and when I see art created by Midjourney, DALL-E or Stable Diffusion.

Leading up to the magic

I’m old enough to remember when Deep Blue beat world champion Garry Kasparov at chess in 1997. That was awesome, but still within the realm of what could be expected from ever more powerful computers. Deep Blue’s creators at IBM agreed. Here they are in a press release from the time:

Does Deep Blue use artificial intelligence? The short answer is “no”. Earlier computer designs that tried to mimic human thinking weren’t very good at it. No formula exists for intuition… Deep Blue relies more on computational power and simpler search and evaluation function.

The long answer is also “no”. “Artificial Intelligence” is more successful in science fiction that it is here on earth, and you don’t have to be Isaac Asimov to know why it’s hard to design a machine to mimic a process we don’t undersand very well to begin with. How we think is a question without an answer. Deep Blue could never ba a HAL-9000 if it tried. Nor would it occur to Deep Blue to “try”.

Almost twenty years later, a computer beat Lee Sedol at Go, a game orders of magnitude more complex than chess.

That was more than just cool.

I found it especially intriguing that the creators of the winning algorithm couldn’t understand how it had happened, but also that Sedol himself described AlphaGo’s style as deeply creative and surprising. Perhaps after all, there was such a thing as a “formula for intuition”.

This Transforms everything

The following year – 2017 – I failed to notice what would prove to be a seminal research breakthrough. The paper that marks this milestone was titled Attention Is All You Need and it gave rise to the Transformer architecture.

Up until that point, state of the art machine learning models had made use of convolutional and recurrent encoder-decoder configurations, connected through what’s called an attention mechanism.

Did you just zone out on me?

That’s perfectly understandable, sorry about the digression. The details really aren’t important but my point is this: Even though the Attention paper was hailed as a disruptive breakthrough which gave rise to technologies such as ChatGPT, the concept it proposed really wasn’t all that radical. Attention had already been part of previous solutions, the ‘only’ new thing with the Transformer architecture was to put it center stage.

Disruption vs. continuity

To what extent have historical breakthroughs in the field of AI been truly disruptive? It’s hard to say.

Take deep learning. British-Canadian computer scientist and cognitive psychologist Geoffrey Hinton is often credited for having come up with it. In fact, he was awarded the 2018 Turing Award for it (together with Yoshua Bengio and Yann LeCun).

But Hinton and his crew stood on the shoulders of giants. Pretty much all of their contributions were tweaks on earlier work. For example, Oliver Selfridge had presented his multi layered neural network “Pandemonium” already in 1958. Frank Rosenblatt was another guy (guys, guys, guys. Sadly, the history of AI is crowded with guys, especially in the early decades) who laid claim on inventing what we now think of as deep learning (and which Hinton initially thought of as ‘deep belief networks’).

To be fair, there are those who claim that deep learning really was disruptive. US/Taiwanese/Chinese computer scientist/entrepreneur/investor Kai-Fu Lee (whose book I’ve referenced before) is amongst them. He sees the match between Lee Sedol and DeepMind’s winning algorithm AlphaGo as the defining moment for deep learning. The stirr it caused in China was similar to how the Americans came together around the Apollo program in reaction to the Russians putting Sputnik in orbit.

Still others would claim that Hinton’s real contribution wasn’t really deep learning in the first place, but but the important technique known as backpropagation, which he did invent (in 1986) and without which deep learning would not have been possible.

Nevertheless though, technologically speaking, the progress towards strong AI seem to be colored more by continuity than by disruption.

So were the nay-sayers wrong?

There’s a reason why roasting is a popular genre in standup comedy. Actually make that two reasons: the audience like it when the supposedly successful gets revealed as pretentious gasbags; and it’s easy to cut people down to size.

As in comedy, so also in science and technology.

Or actually you might want to make that science versus technology, because the most rabid detractors of AI can probably be found among scientist.

Nobody got in a better punch than Berkeley philosopher Hubert Dreyfus, who equated the quest for strong AI with alchemy. Here he is in a paper from 1965, foreshadowing the first “AI winter”:

No more striking example exists of an ‘astonishing’ early success and the equally astonishing failure to follow up. According to this definition of progress, the first man to climb a tree could claim tangible progress toward flight to the moon.

Or why not listen to the legendary mathematician and computer scientist Donald Knuth, another recipient of the Turing award. Here he is in an email to Nils J. Nilsson, one of the founding fathers of AI, dated 1981:

I’m intrigued that AI has by now succeeded in doing essentially everything that requires “thinking” but has failed to do most of what people and animals do “without thinking” – that, somehow, is much harder! I believe the knowledge gained while building AI programs is more important than the use of the programs.

Scruffies vs. neats

Very little of the AI-roasting was actually done for the sheer joy of backstabbing. Instead, it was expression of concern that the field as a whole was moving in the wrong direction.

Because broadly speaking, there has always been two competing approaches to AI. The dominant one historically have been rule based. The success of rule based ‘expert systems’ were what kept the lights on in AI labs for a long time. It’s what enabled IBM’s Deep Blue to beat Garry Kasparov and it’s what powered the AI tools that entered medicine during the 80’s.

But when rule based systems hit a wall around the middle of that decade, it’s also what initiated what’s referred to as the second AI winter.

And that bleak period was actually brought on by AI practitioners themselves, who often felt that they had promised more than they could deliver on. Here’s Drew McDermot from 1985:

Unfortunately, the more you attempt to push the logicist project, the less deduction you find. What you find instead is that many inferences which seem so straightforward that they must be deductions turn out to have non-deductive components. […] Think of the last time you made a plan, and ask yourself if you could have proven the plan would work. Chances are you could easily cite ten circumstances under which the plan would not work, but you went ahead and adopted it anyway.

McDermot himself was very much a part of what he calls “The logicist project”, which represented the rule based AI paradigm as a whole.

Out of the ashes of the second AI winter, rose the Scruffies, a tribe of computer scientists who contrasted themselves with what they labeled as the old guard of rule based Neats.

Scruffies represented a revived interest in neural networks, which had been out of favor for decades (=ever since the first AI winter hit in the mid-sixties).

Rise of the robots

Robotics as a field is a bit of a bastard in that it could lay varying claims to parentage. Mechanical engineering and its pursuit to automate industrial production lines, is one obvious place to start looking. But a less utilitarian view makes it look like modern robotics is simply an outcropping of artificial intelligence.

Here’s what I mean by that: Marvin Minsky, John McCarthy, Andrew Ng and many other people that we think of as innovators within the field of AI, actually turned to robotics in order to explore their theories.

Part of the reason for that is the fact that robots sense the world, which opens up interesting opportunities. If you want to build a self-driving car for example, the hard problems will mostly boil down to computer vision and control engineering, both of which are core disciplines of AI. Same goes for having autonomous helicopters teach themselves to fly stunts. These days if you want to find the frontier of AI, you could do worse than trying your hand at swarm orchestration and distributed AI.

Are we there yet dad?

People have been awed by technological progress throughout the ages. It’s certainly a privilege to live through a historic period when technology repeatedly does that to you, but it also means running the risk of getting a bit jaded, because things are never actually as promising as they seem to be.

Nobel (and Turing prize) laureate Herb Simon was off by three decades in his prediction of when computers would beat humans at chess.

Vint Cerf and Robert Kahn who pretty much invented the Internet (and were also awarded the Turing prize; it’s beginning to look as if they hand it out like candy!) had high hopes that their Strategic Computing program – one of the most expensive RnD projects ever – would take AI ‘out of the labs’. That was in the 80’s. The program didn’t make much of a dent.

Twenty years ago DARPA poured immense resources into project Möbius, which was supposed to “create a computer understandable knowledge base whose content mirror that of the web.” It was hailed as an astounding success before it was mothballed.

All in all: it’s fair to say that there’s been a good number of false starts and premature celebrations.

Or maybe that’s the wrong way to think about it, perhaps rather every aha-moment that we’ve arrived at has a certain validity of its own, given the historical context.

So even if history will prove me wrong, I’ll still persist in saying that from my point of view it seems evident that we’ve just arrived, that AI finally did deliver on its promise. I just feel it in my bones. (plus I also see proof of it in all the explosive wave of creativity that is being released, both in the labs of academia and on the Internet at large).

But I don’t want to be so presumptuous as to get the last word in this never ending story.

Let’s leave that instead to Andrew Ng. He’s the guy who taught helicopters tricks with reinforcement learning when he was a young professor at Stanford 15 years ago.

He then went on for found Coursera and to do other useful things. His newsletter pops up in my inbox every now and then, here’s what he recently had to say about the current state of affairs:

The latest LLMs [= Large Language Models] exhibit some superhuman abilities, just as a calculator exhibits superhuman abilities in arithmetic. At the same time, there are many things that humans can learn that AI agents today are far from being able to learn.

If you want to chart a course toward AGI [=Artificial General Intelligence], I think the baby steps we’re making are very exciting. Even though LLMs are famous for shallow reasoning and making things up, researchers have improved their reasoning ability by prompting them through a chain of thoughts (draw one conclusion, use it to draw a more sophisticated conclusion, and so on).

To be clear, though, in the past year, I think we’ve made one year of wildly exciting progress in what might be a 50- or 100-year journey.Benchmarking against humans and animals doesn’t seem to be the most useful question to focus on at the moment, given that AI is simultaneously far from reaching this goal and also surpasses it in valuable ways. I’d rather focus on the exciting task of putting these technologies to work to solve important applications, while also addressing realistic risks of harm.
Andrew Ng, quoted from his newsletter The Batch : What matters in AI right now, March 22nd 2023

Much of this post was inspired by Nils J. Nilsson’s excellent book The Quest For Artificial Intelligence : A History of Ideas And Achievements.

At 500+ pages and with a somewhat technical approach it was sometimes a challenging read, but I heartily recommend it to anyone interested in getting a strong understanding of how the field has evolved over time. Born in 1933, Nilsson came of age as a (computer) scientist just as AI started to take off, and he remained intimately involved up until his book was published in 2010.

The long backstory to how deep learning came to be, didn’t make it into this post but I find it interesting to note that one of its true pioneers was a Swede and even a fellow KTH:er. Ulf Grenander, who died a few years back, was a professor of applied mathematics here around the time when I was born (although he spent most of his career in the US). His contributions in the field of statistics were key to so called hierarchical models, which was really the first instance of neural networks using Bayesian inference in order to propagate probabilities between layers.

This type of hierarchical architecture was first proposed by Mumford and Lee in 2003. The year after that, Jeff Hawkins’ book On Intelligence came out. I immediately devoured it, not knowing that Hawkins’ novel views on how the brain works, would strongly influence Geoffrey Hinton and his group, who published some of the key papers that would allow for deep learning only two years later, in 2006.

I’d also like to take the opportunity to thank my good friend Gustav Eje Henter, who’s always there to sate my curiosity with all things related to AI.