Abstract

The attacker can generate adversarial examples (AEs) to stealthily mislead automatic speech recognition (ASR) models, raising significant concerns about the security of intelligent voice control (IVC) devices. Existing adversarial attacks mainly generate AEs to mislead ASR models to output specific target English commands (e.g., open the door). However, it remains unknown whether AEs can be used to issue commands in other languages to attack commercial black-box ASR models. In this paper, taking Chinese phrases (e.g., 支付宝付款) and “Chinese-English code-switching” phrases (e.g., 关闭GPS) as the target commands, we propose adversarial attacks for commercial multilingual ASR models. In particular, if a multilingual speech recognition model can recognize Chinese and English, we call it a Chinese-English speech recognition model. In English, the meaning of “支付宝付款” and “关闭GPS” are “Alipay payment” and “turn off GPS”, respectively. In detail, we generate transferable AEs based on the open-sourced conventional DataTang Mandarin ASR model. Given 55 target commands, the success rate for generating AEs of them is up to 96% and 80% for Aliyun ASR API and Tencentyun ASR API, respectively. Our AEs can trigger actual attack actions on voice assistants (e.g., Apple Siri, Xiaomi Xiaoaitongxue) or spread malicious messages through ASR API services, while the target commands in the AEs are inaudible to human beings. Finally, by analyzing the spectrum differences between benign audio clips and AEs, we propose a general defense against adversarial audio attacks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.