Deep Neural Networks (DNN) are vulnerable to adversarial perturbations-small
changes crafted deliberately on the input to mislead the model for wrong
predictions. Adversarial attacks have disastrous consequences for deep
learning-empowered critical applications. Existing defense and detection
techniques both require extensive knowledge of the model, testing inputs, and
even execution details. They are not viable for general deep learning
implementations where the model internal is unknown, a common ‘black-box’
scenario for model users. Inspired by the fact that electromagnetic (EM)
emanations of a model inference are dependent on both operations and data and
may contain footprints of different input classes, we propose a framework,
EMShepherd, to capture EM traces of model execution, perform processing on
traces and exploit them for adversarial detection. Only benign samples and
their EM traces are used to train the adversarial detector: a set of EM
classifiers and class-specific unsupervised anomaly detectors. When the victim
model system is under attack by an adversarial example, the model execution
will be different from executions for the known classes, and the EM trace will
be different. We demonstrate that our air-gapped EMShepherd can effectively
detect different adversarial attacks on a commonly used FPGA deep learning
accelerator for both Fashion MNIST and CIFAR-10 datasets. It achieves a 100%
detection rate on most types of adversarial samples, which is comparable to the
state-of-the-art ‘white-box’ software-based detectors.