Abstract

Speech synthesis technology has made great progress in recent years and is widely used in the Internet of things, but it also brings the risk of being abused by criminals. Therefore, a series of researches on audio forensics models have arisen to reduce or eliminate these negative effects. In this paper, we propose a black-box adversarial attack method that only relies on output scores of audio forensics models. To improve the transferability of adversarial attacks, we utilize the ensemble-model method. A defense method is also designed against our proposed attack method under the view of the huge threat of adversarial examples to audio forensics models. Our experimental results on 4 forensics models trained on the LA part of the ASVspoof 2019 dataset show that our attacks can get a 99 % attack success rate on score-only black-box models, which is competitive to the best of white-box attacks, and 60 % attack success rate on decision-only black-box models. Finally, our defense method reduces the attack success rate to 16 % and guarantees 98 % detection accuracy of forensics models.

Highlights

  • Speech synthesis technologies have advanced significantly in recent years [1, 2]

  • The mainstream speech synthesis system generally consists of two parts: spectrogram prediction network and vocoder. e spectrogram prediction network converts the text into the mel spectrograms

  • E main contributions of this article can be summarized as follows: (i) To the best of our knowledge, we are the first to propose a black-box adversarial attack method only relying on the output distribution of audio forensics models and we use the ensemble-model method to increase the transferability of adversarial examples to implement decision-only black-box attacks

Read more

Summary

Introduction

Speech synthesis technologies have advanced significantly in recent years [1, 2]. Speech synthesis generally refers to the process of converting text into speech. There are studies on using the transferability of adversarial examples to achieve black-box attacks, it still relies on whitebox models to generate adversarial examples [31]. (i) To the best of our knowledge, we are the first to propose a black-box adversarial attack method only relying on the output distribution of audio forensics models and we use the ensemble-model method to increase the transferability of adversarial examples to implement decision-only black-box attacks. E rest of the paper is organized as follows: Section 2 introduces several audio forensics models, which are the victim models in this paper; Section 3 describes the proposed adversarial attacks and defense methods; Section 4 introduces the experimental setup and results; and Section 5 gives theconclusion and future work

Audio Forensics Models
Audio Adversarial Examples Generation
Experiments
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call