Generating Audio Adversarial Examples with Ensemble Substituted Models

Yun Zhang,Guishan Dong,Guowen Xu,Xizhao Luo,Hongwei Li

doi:10.1109/icc42927.2021.9500431

Abstract

The rapid development of machine learning technology has prompted the applications of Automatic Speech Recognition(ASR). However, studies have shown that the state-of-the-art ASR technologies are still vulnerable to various attacks, which undermines the stability of ASR destructively. In general, most of the existing attack techniques for the ASR model are based on white box scenarios, where the adversary uses adversarial samples to generate a substituted model corresponding to the target model. On the contrary, there are fewer attack schemes in the black-box scenario. Moreover, no scheme considers the problem of how to construct the architecture of the substituted models. In this paper, we point out that constructing a good substituted model architecture is crucial to the effectiveness of the attack, as it helps to generate a more sophisticated set of adversarial examples. We evaluate the performance of different substituted models by comprehensive experiments, and find that ensemble substituted models can achieve the optimal attack effect. The experiment shows that our approach performs attack over 80% success rate (2% improvement compared to the latest work) meanwhile maintaining the authenticity of the original sample well.

Full Text