The Better Deep Learning Gets, the More Vulnerable It Is to Adversarial Attacks

Deep learning hasn’t been around for a very long time. It was only in 2012 that a team using the technology first won the ImageNet Large Scale Visual Recognition Challenge, and that was with an error rate of about 25 percent. Now ten years later of course, deep learning is a foundational component of computer vision and tend to radically outperform humans.

Its pervasiveness notwithstanding, it’s not entirely evident how to define what deep learning is. Yann LeCun, Turing prize laureate and inventor of convolutional neural networks–a type of algorithm especially popular in machine vision–says that:

Deep learning is not an “algorithm”. It’s merely the concept of building a machine by assembling parameterized functional blocks and training them with some sort of gradient based oprimization method. That’s it. You are free to choose your architecture, learning paradigm, etc.

If deep learning come in many flavours, it still used to be clear how to measure its effectiveness. Whether a particular implementation is meant to understand the world visually; make sense of human speech, or decide the optimal folding of a protein, it all comes down to accuracy.

That is, until a research paper with the telling title Is Robustness the Cost of Accuracy? was published early 2019.

In it, 18 deep learning models with exceptional track record in the aforementioned ImageNet competition was exposed to different types of adversarial attack vectors. The results were disquieting; the better a model performs in terms of accuracy, the more vulnerable it is to these type of attacks.

I’ve written before about adversarial attacks, but let’s briefly recap:

These are not your ordinary run of the mill cyber security threats. You can’t protect yourself with anti-malware, intelligent compilers flagging deprecated libraries or static/dynamic analysers, however great they may be at finding security flaws in your code.

That’s because adversarial attacks don’t depend on exploiting bugs. Not as such. Which is why researchers at Tencent were not awarded Tesla’s bug bounty award, when they proved that they could manipulate the worlds most sophisticated autopilot without gaining access to any of the internals. Instead, the worrying experiment described in this paper shows how a state of the art Tesla with all the latest firmware patches in place, could be made to steer straight into oncoming traffic simply by placing three innocuous little stickers on the ground in front of the car.

Oh, and they also managed to switch the windshield wipers on and off using similar kinds of stickers.

This is how the Tesla security team motivated that the research group in question were not eligible for the bug bounty:

The findings are all based on scenarios in which the physical environment around the vehicle is artificially altered to make the automatic windshield wipers or Autopilot system behave differently, which is not a realistic concern given that a driver can easily override Autopilot at any time by using the steering wheel or brakes and should always be prepared to do so and can manually operate the windshield wiper settings at all times.

If I were so lucky as to own a Tesla, I’m not sure I’d be entirely convinced by this dichotomy between onboard bugs and ‘external factors’, especially not if a consequence of the latter would be a head on collision.

Stickers like the ones used in the Tesla hack are know as adversarial patches. It’s been shown before that they can be applied to stop signs and make them disappear to autonomous vehicles. Versions of the same technique can also render a human being invisible to surveilence systems (see previous post for links to papers).

Adversarial patching is an example of model evasion, which is a class of attack techniques designed to fool a deep neural network into making incorrect classifications.

However powerful these techniques are, they also have shortcomings. Think about someone trying to trick face detection cameras at an airport. Security personnel would have been trained to spot adversarial patches. In a situation like that, it’d be very difficult to make adversarial perturbations to the test-time input.

The alternative to model evasion is called data poisoning, a process that happens at training-time. Here’s from the computer science section at University of Maryland:

These attacks aim to manipulate the performance of a system by inserting carefully constructed poison instances into the training data. Sometimes, a system can be poisoned with just one single poison image, and this image won’t look suspicious, even to a trained observer.
[…]
In “clean label” attacks, the poison image looks totally innocuous, and is labelled properly according to a human observer. This makes it possible to poison a machine learning dataset without having any inside access to the dataset creating process. Clean label attacks are a threat when…

– An attacker leaves an image on the web, and waits for it to be picked up by a bot that scrapes data. The image is then labelled by an expert, and placed into the dataset.

– A malicious insider wants to execute an attack that will not be detected by an auditor or supervisor.

– Unverified users can submit training data: this is the case for numerous malware databases, spam filters, smart cameras, etc.

For example, suppose a company asks employees to submit a photo ID for its facial recognition control system; an employee provides a poisoned photo, and this gives her back-door control of the face recognition system.

For the full read, go check out the paper Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks.

So once again, I feel obliged to end on a tentatively pessimistic note; if there’s a inverse correlation between how accurate machine learning models are, and how robust they are with respect to adversarial attacks, doesn‘t that mean we’re pretty much in a corner? I bumped into a PhD student the other day who’s pursuing this exact question in his research. So I asked if the risks are hyped. Here’s his reply:

This is indeed a huge problem. Gartner estimate that in 2022, thirty percent of all cyber attacks will be conducted using adversarial examples, data poisoning and model theft, and some Microsoft researchers recently conducted an analysis which says that basically very very few companies are ready for these attacks.