The Sim-To-Real Gap - Slow thoughts

Lev Manovich is best known as the writer of the theory tome The Language of New Media—mandatory reading when I studied interaction design at college—and perhaps for the initiated few as a contributor to the visual effects of the firstTron movie.

But he’s also a schooled programmer and the way it worked back when he studied computer science in the Soviet union of the 70’s, was you wrote your programs with pen and paper, and only when you were ‘done’ you got to type the code into an actual machine (there were so preciously few of them) to see if it would run.

Anyone who’s come close to coding will understand the challenges. Programming is an iterative process where you learn through trial and error. Preferably in short feedback loops.

These days robots have a similar kind of problem. The way robots interact with the world used to be determined by line-by-line instructions, fed to them by a god like programmer.

That all changed with the machine learning revolution. A paradigm that meant roboticists could start telling the robots what to achieve, and let the machines figure out how.

There was just one catch: while watching babies and kittens learn by doing is cute, you don’t want to get in the way of a ten ton industrial robot busy figuring out the fundamentals. You also don’t want a soon-to-be autonomous car cruising your neighbourhood to pick up traffic rules.

There are two possible solutions to this problem. You either create a controlled environment where crashing into stuff is ok, or you let the robot (widely speaking, really any type of autonomous agent) learn the ropes in a simulation.

When it comes to most brands of robotics (with the exception of drones) the second alternative tends to be the default choice.

Training in simulated environments have many advantages. Machine learning algorithms are hungry for data and in virtual worlds they can have all they can eat. Think of it like when Neo jacks into the construct to learn Kung Fu at record time in the first Matrix movie.

But Neo also encounters the limits of what can be achieved in simulation mode, and is shocked by the harsh transition from it to the actual reality.

Robots would feel Neo’s pain. Even when they get everything perfectly right in sim, they’re still bound to mess things up when they try to pull of the same moves in reality. That’s why they call it the sim-to-real gap, described in the research paper Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics like so:

Simulation-based training provides data at low-cost, but involves inherent mismatches with real-world settings. Bridging the gap between simulation and reality requires, first of all, methods that are able to account for mismatches in both sensing and actuation.

It makes me want to end on another pop cultural reference, this time from the the original Star Wars movie. It’s something Han Solo says when Luke practice blindfolded against small spherical robots that levitates around him shooting plasma beams:

Look, good against remote is one thing. Good against the living, that’s something else.