Headless Horseman: Adversarial Attacks on Transfer Learning Models

Abstract

Transfer learning facilitates the training of task-specific classifiers using pre-trained models as feature extractors. We present a family of transferable adversarial attacks against such classifiers, generated without access to the classification head; we call these headless attacks. We first demonstrate successful transfer attacks against a victim network using only its feature extractor. This motivates the introduction of a label-blind adversarial attack. This transfer attack method does not require any information about the class-label space of the victim. Our attack lowers the accuracy of a ResNet18 trained on CIFAR10 by over 40%.

Publication
In IEEE International Conference on Acoustics, Speech and Signal Processing - Oral

Related