In the movie Blues Brothers, the opening church scene finds Jake and Elwood standing bored at the back of the aisle, until the choir begins to sing and divine light streams through the stained-glass window. Jake has an epiphany—he’s going to bring the band back together; he sees the light!

Wonderfully corny.

I thought of that scene as I came across British zoologist Andrew Parker’s Light Switch Hypothesis, a simple yet plausible explanation to the exponential acceleration of evolution that took place on earth during the Cambrian period. Parker argues that it was triggered by the fluke development of photosensitivity. Once living beings started perceiving light, there was no turning back, the game was on.

In other words, vision is at the root of intelligent behaviour.

One person strongly influenced by Parker’s ideas was Fei-fei Li, author of The Worlds I see : Curiosity, Exploration, and Discovery at the Dawn of AI.

Li is now a professor of computer science at Stanford, but back in 1980, when Blues Brothers was released, she was a six-year-old living in a cramped apartment on the outskirts of Beijing. She was the daughter of free-spirited parents who failed so badly at fitting into the narrow categories available to good Chinese citizens, that they eventually had to leave everything behind and set out to start a new life in America.

The Worlds I See is as much an extended biography of Li’s family, as it’s a fascinating first-hand account from someone who’s played a pivotal role in giving birth to artificial intelligence, and especially the branch of it called machine vision.

***

If you know anything about machine vision, you might know that 2012 was a landmark year.

That’s when Alex Krizhevsky, Ilya Sutskever and Geoff Hinton blew the competition out of the water with their convolutional neural network AlexNet, which won the ImageNet Large Scale Visual Recognition Challenge (the competition which Fei-fei Li is famous for having founded, but we’ll get back to that), by a revolutionary margin.

It’s the year you’d have wished to invest in Nvidia, as ‘deep learning’ became the buzzword of the tech world.

It’s easy to think 2012 is when neural networks were invented, but nothing could be further from the truth.

***

In fact, if you want to understand the origins, you have to start way back in the early 1940’s. That’s when neurophysiologist Warren McCulloch and logician Walter Pitts published A Logical Calculus of Ideas Immanent in Nervous Activity, demonstrating how neural networks could compute any arithmetic or logical function.

That paper, in the words of Fei-fei Li, was “the neuro-scientific equivalent of splitting the atom”.

Two important developments happened about fifteen years after this. One was Frank Rosenblatt’s “perceptron“, the very first pre-digital steam-punky implementation of a non-biological neuron. The other was David Hubel and Torsten Wiesel’s breakthrough studies of the mammalian visual cortex, which established the principles upon which convolutional neural networks would eventually be developed.

(Torsten Wiesel, a Swedish expat and Nobel laureate, is still very much alive and kicking at age 100. I recently came across a lovely profile of him).

Fast forward another ten years, and the Japanese computer scientist Kunihiko Fukushima implements the first software-based neural network. He calls it the Neocognitron. (Fei-fei Li is now four years old).

The most important development of the ensuing decade was American psychologist and cognitive scientist David Rumelhart’s paper, published 1986, in which the technique known as back-propagation sees the light of day.

Rumelhart’s co-author on that paper is Geoffrey Hinton. The same Hinton who would get the credit of having “invented machine learning” some thirty years later. Also the same Hinton who’d make waves another decade on, which brings us to the present day, when he recently quit Google due to concerns related to AI safety.

One of Geoff Hinton’s very first students was a bright young Frenchman with a PhD in computer science. His name was Yann LeCun.

Yann LeCun was the first person to take neural networks out of the lab. His ‘LeNet’ (cute name!) brought machine reading to both mail sorting and automated handling of checks.

We’re now in the late 80’s, early 90’s. This is where the promising development of neural networks hit a bump in the road. The theory is sound, but the technology just isn’t mature enough yet. In the public eye, it looks like the field has over-promised and under-delivered. What follows is one of many so called AI-winters.

***

During this lull, Fei-fei Li immigrated to the US, completed her undergraduate degree in physics at Princeton, and went on to do her graduate studies in electrical engineering at Caltech.

That’s where she sunk her teeth into a project which would become known as “Caltech 101”.

Caltech 101 was one of the very first datasets of labeled images. When it was published, it comprised 9,144 photos categorised into 101 object classes.

By the standards of its times, it was GI-NOR-MOUS.

By the standards of what we’ve since come to expect since then, it was tiny.

Li does a great job of conveying what it felt like to live through that mind-shift. She’d been pouring her soul, let alone countless hours, into Caltech 101. She was certain that it would be the key to unlock the potential of neural networks, and was hugely frustrated at seeing how it fell short of those expectations.

If one hundred and one categories weren’t enough, how many would be? The answer to that question seemed nightmarish when it started coming into view.

***

For context, we need to do another bit of historical back-propagation here, going back again to the mid-eighties. That’s when cognitive scientist and psychologist Irving Biederman published the theories known as Recognition-by-Components, attempting to explain how visual cognition can be so surprisingly efficient.

Essentially what Biederman claimed, was that all visual objects we’re able to recognize, are formed from simple 3D shapes; like cylinders, cones, and blocks, that can be combined in various ways to represent complex objects. He dubbed these shapes Geons, and posited that there’s 36 of them. The relevant ways in which Geons can combine are known as the Biederman number, and it’s around 30 000.

That’s two orders of magnitude more than 101.

***

How do you even start to take such a leap? That was the question which momentarily occupied Fei-fei Li and the first PhD-student that she got to supervise; a plucky young computer scientist named Jia Deng (yet another name destined for AI’s Hall of Fame). The two of them soon threw caution to the wind and simply went to work.

The endeavour they embarked upon would become ImageNet, one of the most significant achievements in the history of artificial intelligence. The road to it, however, was long and winding, and its creators very nearly abandoned the project many times over.

Even Jitendra Malik—one of Li’s heroes and a prominent figure in machine vision—doubted the sanity of the attempt, warning his protegé that: “The trick to science is to grow with your field. Not to leap so far ahead of it.”

Malik’s scepticism is understandable. When published in 2009, ImageNet would consist of 15 million images, labeled and sorted into 22 000 categories (a little short of Biederman’s number, but still in the right ballpark). Seen from the starting-line, it must have appeared like an impossibly ambitious project.

The fact that it *was* possible, is a testament to the engineering ingenuity of Li & Deng, as well as to their stubborn commitment in the face of overwhelming odds.

It wasn’t just that, however; it was also a happy coincidence that they embarked on their project at a time when the Internet made crowdsourcing viable. Amazon launched their service Mechanical Turk in November 2005 (marketed as “Artificial artificial intelligence”), and thanks to it, ImageNet eventually had over 48,000 contributors from 167 countries.

***

At first, this titanic effort didn’t seem to pay off.

Following the playbook of PASCAL VOC—up until then the largest repository of labeled visual data—Li & Deng had invited the research community to a yearly image recognition competition. The effect was underwhelming. If the bet had been that curating enough data would unleash the potential of machine learning, it simply didn’t seem to have been a wise gamble. The winning contributions of the first two years—2010 and 2011—improved only marginally over the state of the art. Quoting Li: “To say it was ‘humbling’ would be an understatement.”

It’s darkest before dawn however; the winning entry of the competition’s third year changed everything:

I still had trouble believing AlexNet was the advance it seemed to be. The leap seemed too great. But the more I thought about it, the more it seemed to bear the hallmark of every great breakthrough: the veneer of lunacy, wrapped around an idea that just might make sense.

The story could have ended here, with the camera panning to the horizon as Li and Deng ride into the sunset.

Interestingly—and bravely—Li chose instead to end her narrative on a darker note. For after all her triumphs, she took the job as vice president of AI at Google 2018. A position and a timing that gave her a front-row seat to the great techlash.

Biased training data, vulnerability to adversarial attacks, insufficient regulatory control; as deep learning picked up pace, it seemed to give rise to a perfect storm. Li again:

As scary as each of these issues was in isolation, they pointed toward a future that would be characterized by less oversight, more inequality, and, in the wrong hands, possibly even a kind of looming, digital authoritarianism. It was an awkward thought to process while walking the halls of one of the world’s largest companies, especially when I considered my colleagues’ sincerity and good intentions. These were institutional issues, not personal ones, and the lack of obvious mustache-twirling villains only made the challenge more confounding.

It is with these cautionary words that Fei-fei Li leaves the reader. I’m struck by how this somber note seems to be a recurring theme with people who know AI from the inside. I’ve written before about Mustafa Suleyman (founder of DeepMind and of Inflection, now at Microsoft) and his book The Coming Wave, which is also kind of a downer.

Then again perhaps it’s not about optimism but about sobering up from the hype, gaining a clear-eyed view of our past to better shape our future.