Though deep neural networks (DNNs) have revealed their extraordinary performance in the fields of computer vision, it is evident that the vulnerability of DNNs to adversarial attacks with crafted human-imperceptible perturbations. Most existing adversarial attacks draw their attention to invading target deep task models by enhancing input-diagnostic features via image rotation, warp, or transformation to improve adversarial transferability. Such manners pay close concentration to operation on original inputs regardless of the properties from different source information. Research has inspired us to consider utilizing source-agnostic information and integrating generated features with raw inputs to enrich adversarial properties. For such needs, we propose a simple and flexible adversarial attack method with source-agnostic Feature Inducing Method (FIM) for improving the transferability of adversarial examples (AEs). FIM first focuses on generating perturbed features by imitating diverse patterns from multi-domain sources. Instead of exploiting the original inputs’ diversity, such proposed work gains the various properties by random feature imitation referring to different source distributions. By optimizing the generated features with norm bounds, FIM then integrates original inputs with imitative features. Such manner can diverse row positive class-general features, which reduce the capability of class-specific patterns on cross-model transferability. Based on the crafted property, FIM employs the adaptive gradient-based strategy on such information to generate perturbations, which helps to decrease probability dropping into local optimal when searching for the decision boundary of source and target models. We conduct detailed experiments to evaluate the performance of our proposed approach with existing baselines on three public datasets. The experimental results reveal the better performance of the proposed works on fooling source and target task models leading to a considerable margin in most adversarial scenarios. We further investigate adversarial attacks on adversarial defense models (with adversarial training and trades). Such a proposed attack strategy achieves better attack quality by a margin over 3.00% on CIFAR10 and reduces the robust accuracy of adversarially trained models by a large margin near 9.00% on MNIST. Furthermore, we exploit the performance of the proposed attack strategy applied to feature-level adversarial domains and conduct evaluations to demonstrate its adversarial feasibility in integrating with various attack mechanisms, which gains better adversarial effectiveness over 20.00% than the base attacks on studied deep task models.
Read full abstract