Abstract

With the development of speech synthesis and voice conversion techniques, the quality of artificially generated speech has been significantly improved and detecting such spoofing speech becomes crucial to practical applications, such as automatic speaker verification (ASV). State-of-the-art neural-network-based spoofing detection models can distinguish most artificial utterances from natural ones effectively in the latest ASVspoof 2019 evaluation. Motivated by recent progresses of adversarial example generation, this paper studies the robustness of neural-network-based speech spoofing detectors against adversarial attacks. To this end, an adversarial post-processing network (APN) is proposed which generates adversarial examples against a white-box anti-spoofing model by post-processing the speech waveforms produced by a baseline voice conversion system. Experimental results demonstrate the adversarial ability of our proposed APNs against the white-box anti-spoofing models which were used as the adversarial targets of APNs at the training stage. For example, the equal error rate (EER) of a fused detection model based on light convolution neural networks (LCNNs) increased from 0.278% to 12.743% under the white-box condition without degrading the subjective quality of converted speech. Furthermore, the trained APNs can also perform against the detectors with either unseen structures or unseen features by raising their EERs in our experiments. All these results indicate the threat of adversarial speech generation to the performance of state-of-the-art spoofing detection models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call