Twice attention networks for synthetic speech detection

Chen Chen,Yaozu Song,Deyun Chen,Bohan Dai

doi:10.1016/j.neucom.2023.126799

Abstract

Automatic speaker verification (ASV) systems are highly vulnerable to synthetic speech attack. And the artifacts are the key spoofing clue to distinguish real and synthetic speech. In this paper, we focus on the detection of artifacts and proposed the twice attention networks (TA-networks). It is an end-to-end network which consists of feature extraction module and back-end classifier. The feature extraction module is the core of the TA-networks, and it is a twice attention Unet (TA-Unet). It contains two sequential attention modules: (1) a five-layer U-shaped network with attention gate to first obtain the general contour of artifacts and then (2) a softmax-based filter with adaptive coefficient to dynamically highlight the differences between different frequencies, and these differences can be regarded as elaborate artifacts. After the processing of the TA-Unet, the feature maps of real and synthetic speech are more discriminative for the back-end SCG-Res2Net50 classifier. Experimental results show that the TA-networks achieve equal error rates of 1.62% on ASVspoof 2019 logical access sub-challenge, and it is significantly better than most of the other experimental models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Twice attention networks for synthetic speech detection

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Vulnerability issues in Automatic Speaker Verification (ASV) systems
Priyanka Gupta ... Rodrigo Capobianco Guido
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2024
Priyanka Gupta, et. al.Priyanka Gupta ... Rodrigo Capobianco Guido
10 Feb 2024
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2024

Voice Spoofing Countermeasure for Synthetic Speech Detection
Farman Hassan ... Ali Javed
-
Farman Hassan, et. al.Farman Hassan ... Ali Javed
05 Apr 2021
05 Apr 2021

Voice Presentation Attack Detection Using Convolutional Neural Networks
Ivan Himawan ... Srikanth Madikeri
-
Ivan Himawan, et. al.Ivan Himawan ... Srikanth Madikeri
01 Jan 2019
01 Jan 2019

Deep Learning Serves Voice Cloning: How Vulnerable Are Automatic Speaker Veriﬁcation Systems to Spooﬁng Trials?
Pavol Partila ... Miroslav Voznak
IEEE Communications Magazine | VOL. 58
Pavol Partila, et. al.Pavol Partila ... Miroslav Voznak
01 Feb 2020
IEEE Communications Magazine | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Twice attention networks for synthetic speech detection

Abstract

Talk to us

Similar Papers

More From: Neurocomputing