Abstract

Numerous studies have proved that deep learning models are sensitive to attacks from adversarial examples. However, due to the high computationally demanding and limiting explanation for deep neural networks (DNNs), research on poisoning attacks against deep models is insufficient. In this work, considering the interpretation that DNNs are feature extractors, we propose a clean-label poisoning attack against DNNs by dominant image features added to the original data. Each category of clean data is augmented with imperceptible perturbations which we call “dominant image features” of another category. The dominant image features determine which class the input belongs to, while the clean input behaves like noise. Well-trained DNNs on these poisoning data will get extremely low accuracy on clean test data. The attack can make personal data unlearnable for DNNs, which preserves an individual's privacy. Besides, our data poisoning attack can be expanded into a powerful clean-label backdoor attack in that the poisoned samples and their labels are consistent for humans. Experimental results show that our proposed attack can exceed the previous advanced poisoning attacks which demonstrate the effectiveness of our presented attack. Additionally, our work provides a new perspective on the interpretation of DNNs as feature extractors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.