Replay Speech Detection Based on Dual-Input Hierarchical Fusion Network

Chenlei Hu,Ruohua Zhou,Qingsheng Yuan

doi:10.3390/app13095350

Abstract

Speech anti-spoofing is a crucial aspect of speaker recognition systems and has received a great deal of attention in recent years. Deep neural networks have achieved satisfactory results in datasets with similar training and testing data distributions, but their generalization ability is limited in datasets with different distributions. In this paper, we proposed a novel dual-input hierarchical fusion network (HFN) to improve the generalization ability of our model. The network had two inputs (the original speech signal and the time-reversed signal), which increased the volume and diversity of the training data. The hierarchical fusion model (HFM) enabled more thorough fusion of information from different input levels and improved model performance by fusing the two inputs after speech feature extraction. We finally evaluated the results using the ASVspoof 2021 PA (Physical Access) dataset, and the proposed system achieved an Equal Error Rate (EER) of 24.46% and a minimum tandem Detection Cost Function (min t-DCF) of 0.6708 in the test set. Compared with the four baseline systems in the ASVspoof 2021 competition, the proposed system min t-DCF values were decreased by 28.9%, 31.0%, 32.6%, and 32.9%, and the EERs were decreased by 35.7%, 38.1%, 45.4%, and 49.7%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Apr 25, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Replay Speech Detection Based on Dual-Input Hierarchical Fusion Network

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition
Sunder Ali Khowaja ... Seok-Lyong Lee
Neural Computing and Applications | VOL. 32
Sunder Ali Khowaja, et. al.Sunder Ali Khowaja ... Seok-Lyong Lee
28 Oct 2019
Neural Computing and Applications | VOL. 32

Maximum margin linear kernel optimization for speaker verification
Mohamed Kamal Omar ... Jason Pelecanos
-
Mohamed Kamal Omar, et. al.Mohamed Kamal Omar ... Jason Pelecanos
01 Apr 2009
01 Apr 2009

Speaker Verification Channel Compensation Based on DAE-RBM-PLDA
Shuangyan Shan ... Zhijing Xu
-
Shuangyan Shan, et. al.Shuangyan Shan ... Zhijing Xu
01 Jan 2017
01 Jan 2017

LRPD: Large Replay Parallel Dataset
Ivan Yakovlev ... Nikita Torgashov
-
Ivan Yakovlev, et. al.Ivan Yakovlev ... Nikita Torgashov
23 May 2022
23 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Replay Speech Detection Based on Dual-Input Hierarchical Fusion Network

Abstract

Talk to us

Similar Papers

More From: Applied Sciences