Dr. Philip Cao

Stay Hungry. Stay Foolish.

How to Hack Neural Networks

3 min read

Claudia JohnsonIf only neurologist Oliver Sacks, who wrote The Man Who Mistook His Wife for a Hat,” were still alive! He would find today’s neural networks (the hot new trend from the artificial intelligence community) extremely amusing.

His book describes a man whose brain damage results in the man thinking his wife’s head is a hat. Maybe there are more parallels between the brain and artificial neural networks than what meets the eye (no pun intended).

Neural networks are being leveraged increasingly often in information security to provide a higher level of protection, including against zero day attacks. However, what if the adversary targeted the neural network/machine learning algorithm itself?

In a recent article, Adam Geitgey describes an algorithm and even provides code for tricking a neural network-based image recognition system into identifying a photo of a cat as a toaster:

  1. Feed in the (cat) photo that we want to hack.
  2. Check the neural network’s prediction and see how far off it is from the answer we want to get for this photo.
  3. Tweak our photo using back-propagation to make the final prediction slightly closer to the answer we want to get.
  4. Repeat steps 1–3 a few thousand times with the same photo until the network gives us the answer we want.

Note that knowledge of the neural networks is required in order to leverage back propagation. However, this approach is not new and other examples of misleading input causing machine learning to fail are known, such as the case of defacing a stop sign resulting in autonomous vehicles not recognizing the sign.

Let us make the algorithm more generic so that it can apply to a Data Loss Prevention (DLP) system.  Assume we use a simple example that is well defined: DLP via Domain Name System (DNS) queries.  Instead of a photo being analyzed, individual fields in protocol messages are analyzed to determine when malicious actors are trying to exfiltrate sensitive data, so in the algorithm we replace “photo” with “set of DNS queries”:

  1. Feed in the set of DNS queries we want to hack.
  2. Check the neural network’s prediction and see how far off it is from the answer we want to get for that set of DNS queries.
  3. Tweak our set of DNS queries using back-propagation to make the final prediction slightly closer to the answer we want to get.
  4. Repeat steps 1–3 a few thousand times with the same set of DNS queries until the network gives us the answer we want.

With such methodology, the adversary can successfully bypass such a Data Loss Prevention (DLP) system and imagine even tampering with valid data (e.g., an organization’s valid traffic) to cause the DLP to trigger a false positive.

What can security vendors do to prevent such hacks? Obviously the more the adversary knows about the neural network algorithm, the quicker he can successfully generate hacked input that will cause the system to fail. So, algorithm details must be protected. Geitgey recommends the use of ‘Adversarial Training’: include lots of hacked images or data created using back propagation, and include them in your training data set.

So, the question arises: are we building enough security into our security systems?

Editor’s note: ISACA’s recent tech brief on artificial intelligence is available as a free download.

Claudia Johnson, Cloud Technologist, Oracle

[ISACA Now Blog]

Leave a Reply

Copyright © 2006-2024 Dr. Philip Cao. All rights reserved

Discover more from Dr. Philip Cao

Subscribe now to keep reading and get access to the full archive.

Continue reading