Adversarial Attack and Defense for Commercial Black-box Chinese-English Speech Recognition Systems

Xuejing Yuan,Jiangshan Zhang,Kai Chen,Cheng'An Wei,Ruiyuan Li,Zhenkun Ma,Xinqi Ling

doi:10.1145/3701725

Abstract

The attacker can generate adversarial examples (AEs) to stealthily mislead automatic speech recognition (ASR) models, raising significant concerns about the security of intelligent voice control (IVC) devices. Existing adversarial attacks mainly generate AEs to mislead ASR models to output specific target English commands (e.g., open the door). However, it remains unknown whether AEs can be used to issue commands in other languages to attack commercial black-box ASR models. In this paper, taking Chinese phrases (e.g., 支付宝付款) and “Chinese-English code-switching” phrases (e.g., 关闭GPS) as the target commands, we propose adversarial attacks for commercial multilingual ASR models. In particular, if a multilingual speech recognition model can recognize Chinese and English, we call it a Chinese-English speech recognition model. In English, the meaning of “支付宝付款” and “关闭GPS” are “Alipay payment” and “turn off GPS”, respectively. In detail, we generate transferable AEs based on the open-sourced conventional DataTang Mandarin ASR model. Given 55 target commands, the success rate for generating AEs of them is up to 96% and 80% for Aliyun ASR API and Tencentyun ASR API, respectively. Our AEs can trigger actual attack actions on voice assistants (e.g., Apple Siri, Xiaomi Xiaoaitongxue) or spread malicious messages through ASR API services, while the target commands in the AEs are inaudible to human beings. Finally, by analyzing the spectrum differences between benign audio clips and AEs, we propose a general defense against adversarial audio attacks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adversarial Attack and Defense for Commercial Black-box Chinese-English Speech Recognition Systems

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Privacy and Security

Lead the way for us

Similar Papers

Model Access Control Based on Hidden Adversarial Examples for Automatic Speech Recognition
Haozhe Chen ... Jie Zhang
IEEE Transactions on Artificial Intelligence | VOL. 5
Haozhe Chen, et. al.Haozhe Chen ... Jie Zhang
01 Mar 2024
IEEE Transactions on Artificial Intelligence | VOL. 5

ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture
Gaofeng Cheng ... Yonghong Yan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Gaofeng Cheng, et. al.Gaofeng Cheng ... Yonghong Yan
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

OkwuGbé: End-to-End Speech Recognition for Fon and Igbo
...
-
, et. al. ...
21 Oct 2021
21 Oct 2021

Recognition of target domain Japanese speech using language model replacement
Daiki Mori ... Norihide Kitaoka
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2024
Daiki Mori, et. al.Daiki Mori ... Norihide Kitaoka
20 Jul 2024
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adversarial Attack and Defense for Commercial Black-box Chinese-English Speech Recognition Systems

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Privacy and Security