Abstract

Trojan attacks on deep neural networks (DNNs) exploit a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">backdoor</i> embedded in a DNN model that can hijack any input with an attacker’s chosen signature trigger. Emerging defence mechanisms are mainly designed and validated on vision domain tasks (e.g., image classification) on 2D Convolutional Neural Network (CNN) model architectures; a defence mechanism that is general across vision, text, and audio domain tasks is demanded. This work designs and evaluates a run-time Trojan detection method exploiting <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">STR</u> ong <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">I</u> ntentional <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">P</u> erturbation of inputs that is a multi-domain input-agnostic Trojan detection defence across <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Vi</u> sion, <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">T</u> ext and <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">A</u> udio domains—thus termed as STRIP-ViTA. Specifically, STRIP-ViTA is demonstratively independent of not only task domain but also model architectures. Most importantly, unlike other detection mechanisms, it requires neither machine learning expertise nor expensive computational resource, which are the reason behind DNN model outsourcing scenario—one main attack surface of Trojan attack. We have extensively evaluated the performance of STRIP-ViTA over: i) CIFAR10 and GTSRB datasets using 2D CNNs for vision tasks; ii) IMDB and consumer complaint datasets using both LSTM and 1D CNNs for text tasks; and iii) speech command dataset using both 1D CNNs and 2D CNNs for audio tasks. Experimental results based on more than 30 tested Trojaned models (including publicly Trojaned model) corroborate that STRIP-ViTA performs well across all nine architectures and five datasets. Overall, STRIP-ViTA can effectively detect trigger inputs with small false acceptance rate (FAR) with an acceptable preset false rejection rate (FRR). In particular, for vision tasks, we can always achieve a 0 percent FRR and FAR given strong attack success rate always preferred by the attacker. By setting FRR to be 3 percent, average FAR of 1.1 and 3.55 percent are achieved for text and audio tasks, respectively. Moreover, we have evaluated STRIP-ViTA against a number of advanced backdoor attacks and compare its effectiveness with other recent state-of-the-arts.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.