Abstract

As deep learning models have made remarkable strides in numerous fields, a variety of adversarial attack methods have emerged to interfere with deep learning models. Adversarial examples apply a minute perturbation to the original image, which is inconceivable to the human but produces a massive error in the deep learning model. Existing attack methods have achieved good results when the network structure is known. However, in the case of unknown network structures, the effectiveness of the attacks still needs to be improved. Therefore, transfer-based attacks are now very popular because of their convenience and practicality, allowing adversarial samples generated on known models to be used in attacks on unknown models. In this paper, we extract sensitive features by Grad-CAM and propose two single-step attacks methods and a multi-step attack method to corrupt sensitive features. In two single-step attacks, one corrupts the features extracted from a single model and the other corrupts the features extracted from multiple models. In multi-step attack, our method improves the existing attack method, thus enhancing the adversarial sample transferability to achieve better results on unknown models. Our method is also validated on CIFAR-10 and MINST, and achieves a 1%-3% improvement in transferability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call