Semi-black-box Attacks Against Speech Recognition Systems Using Adversarial Samples

Yi Wu,Jian Liu,Yingying Chen,Jerry Cheng

doi:10.1109/dyspan.2019.8935789

Abstract

As automatic speech recognition (ASR) systems have been integrated into a diverse set of devices around us in recent years, security vulnerabilities of them have become an increasing concern for the public. Existing studies have demonstrated that deep neural networks (DNNs), acting as the computation core of ASR systems, is vulnerable to deliberately designed adversarial attacks. Based on the gradient descent algorithm, existing studies have successfully generated adversarial samples which can disturb ASR systems and produce adversary-expected transcript texts designed by adversaries. Most of these research simulated white-box attacks which require knowledge of all the components in the targeted ASR systems. In this work, we propose the first semi-black-box attack against the ASR system - Kaldi. Requiring only partial information from Kaldi and none from DNN, we can embed malicious commands into a single audio chip based on the gradient-independent genetic algorithm. The crafted audio clip could be recognized as the embedded malicious commands by Kaldi and unnoticeable to humans in the meanwhile. Experiments show that our attack can achieve high attack success rate with unnoticeable perturbations to three types of audio clips (pop music, pure music, and human command) without the need of the underlying DNN model parameters and architecture.

Full Text