Adversarial Attack and Defense on Deep Neural Network-Based Voice Processing Systems: An Overview

Xiaojiao Chen,Sheng Li,Hao Huang

doi:10.3390/app11188450

Abstract

Voice Processing Systems (VPSes), now widely deployed, have become deeply involved in people’s daily lives, helping drive the car, unlock the smartphone, make online purchases, etc. Unfortunately, recent research has shown that those systems based on deep neural networks are vulnerable to adversarial examples, which attract significant attention to VPS security. This review presents a detailed introduction to the background knowledge of adversarial attacks, including the generation of adversarial examples, psychoacoustic models, and evaluation indicators. Then we provide a concise introduction to defense methods against adversarial attacks. Finally, we propose a systematic classification of adversarial attacks and defense methods, with which we hope to provide a better understanding of the classification and structure for beginners in this field.

Highlights

IntroductionWith the successful application of deep neural networks in the field of speech processing, automatic speech recognition systems (ASR) and automatic speaker recognition systems (SRS) have become ubiquitous in our lives, including personal voice assistants (VAs)
In order to better illustrate the application of adversarial attacks and defenses in sound processing systems, we introduce in detail the contents of adversarial attacks, including methods for generating adversarial examples and metrics for adversarial attacks
This study demonstrated the ability of the adversarial attack to deceive the automatic speaker verification (ASV) system

Summary

Introduction

With the successful application of deep neural networks in the field of speech processing, automatic speech recognition systems (ASR) and automatic speaker recognition systems (SRS) have become ubiquitous in our lives, including personal voice assistants (VAs). (e.g., Apple Siri (https://www.apple.com/in/siri (accessed on 9 September 2021)), Amazon Alexa (https://developer.amazon.com/en-US/alexa (accessed on 9 September 2021)), Google Assistant (https://assistant.google.com/ (accessed on 9 September 2021)), iFLYTEK (http://www.iflytek.com/en/index.html (accessed on 9 September 2021))), voiceprint recognition systems on mobile phones, bank self-service voice systems, and forensic testing [1] The application of these systems has brought great convenience to people’s personal and public lives, and, to a certain extent, enables people to access help more efficiently and conveniently. Recent research has shown that the neural network systems are vulnerable to adversarial attacks [2,3,4,5] This will threaten personal identity information and property security and leaves an opportunity for criminals. The methods of adversarial defense are categorized through their characteristics

Attack

Adversarial Examples

Psychoacoustics

Metrics

Attack on ASRs

Attack on Speaker Recognition System

Defence against Adversarial Attack

Attack Threat Model Taxonomy

Adversarial Knowledge

Adversarial Goal

Adversarial Perturbation Scope

Real or Simulated World