Abstract
Although neural network-based speech recognition models have enjoyed significant success in many acoustic systems, they are susceptible to be attacked by the adversarial examples. In this work, we make first step towards using generative adversarial network (GAN) for constructing the targeted speech adversarial examples. Specifically, we integrate the target speech recognition network with GAN framework, which can then be formulated as a three-party game. The generator in GAN aims at generating perturbation that could make the target network misclassified to a specific target, while simultaneously fooling the discriminator treating the adversarial example as a beguine one. The discriminator is to distinguish the crafted examples from the geniue samples. The classification error of the target network is back-propagated via gradient flow to the generator for updating. The target network is responsible for back-propagating the classification error via gradients to the generator for updating, but the target network itself is freezed. With the carefully designed network architecture, loss function and training strategy, we successfully train a generator that could generate the adversarial perturbation for a given speech clip and a target label. Experiential results show that the generated adversarial examples could effectively fool the state-of-the-art speech classification networks, while attaining an acceptable auditory perception quality. In addition, our proposed method runs much faster than the prevalent optimization-based schemes. To facilitate reproducible research, codes, models and data are publicly available at https://github.com/winterwindwang/SpeechAdvGan.
Highlights
Nowadays, the speech user interface is becoming one of the most prevalent human-machine-interaction ways. It has been widely adopted in numerous size-constrained smart equipments, wearable devices, and hand-free intelligent systems, where the user input via a physical or screen keyboard is often inconvenient. These automatic speech recognition (ASR) systems are dependent on running a speech classification model in an always-on mode, receiving the voice and interpret it as commands
Many works have demonstrated that neural networks are susceptible to the specially crafted speech adversarial examples, which are typically constructed by adding peculiar perturbation on the legitimate samples, causing the target acoustic system misbehave
Vaidya et al [6] is the first to report speech recognition system is vulnerable to adversarial examples. They constructed the adversarial examples by adding perturbation on the Mel-Frequency Cepstral Coefficient (MFCC)transformed feature, and converted the processed feature back to the waveform domain
Summary
The speech user interface is becoming one of the most prevalent human-machine-interaction ways. They constructed the adversarial examples by adding perturbation on the Mel-Frequency Cepstral Coefficient (MFCC)transformed feature, and converted the processed feature back to the waveform domain Their generated samples were noise-dominated, which cannot be interpreted by humans, but intelligible to the speech recognition system. Carlini et al [7] suggested to perturb the MFCC feature vector of the given speech Their results shown that the constructed adversarial examples could effectively attack the Hidden Markov Model-based ASR. The proposed GAN-based attacking method could efficiently generate adversarial examples with pre-specified target label, comparing with the recent state-of-the-art speech adversarial example generation schemes. Our proposed targeted speech adversarial example generation using GAN is presented, with a thorough discussion on the network architecture, loss function and training strategy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.