Weighted Cluster-Range Loss and Criticality-Enhancement Loss for Speaker Recognition

Jianye Mo,Li Xu

doi:10.3390/app10249004

Abstract

While traditional i-vector based methods are popular in the field of speaker recognition, deep learning has recently found more and more applications to the end-to-end models due to its attractive performance. One effective practice is the integration of attention mechanism into the Convolution Neural Networks (CNNs). In this work, a light-weight dual-path attention block is proposed by combining the self-attention and Convolutional Block Attention Module (CBAM), which helps to capture more multi-source features with neglectable extra time expense. Additionally, a Weighted Cluster-Range Loss (WCRL) is proposed to enhance the identification performance of Cluster-Range Loss (CRL) on indecisive samples. Besides, to address the low efficiency in the initial training stage of CRL, a novel Criticality-Enhancement Loss (CEL) is also presented. Both of the proposed loss functions could significantly promote the training efficiency and globally improve the recognition performance. Experimental results are presented to show the effectiveness of the proposed scheme, which achieves a competitive top-1 accuracy of 92.0%, top-5 accuracy of 97.6%, and Equal Error Rate (EER) of 3.5% on the VoxCeleb1 dataset.

Highlights

Squeeze and Excitation blocks (SEs) [31] is a channel attention mechanism, where the global spatial information is squeezed by a global average pooling and the channel-wise dependencies are obtained with fully-connected layers and the sigmoid function
Most of the experiments in this work are conducted on the VoxCeleb1 dataset, while VoxCeleb2 is only used for further evaluating the model for speaker verification, and likewise, the CN-Celeb is merely employed in the speaker identification task
A light-weight dual-path attention module and two novel loss functions are proposed for text-independent speaker recognition

Summary

Introduction

Variants of self-attention are emerging [28,29,30] Those attention mechanisms considering the spatial and channel dimension are widely employed especially in the field of CV. Squeeze and Excitation blocks (SEs) [31] is a channel attention mechanism, where the global spatial information is squeezed by a global average pooling and the channel-wise dependencies are obtained with fully-connected layers and the sigmoid function. Drawing on the recent success of self-attention and CBAM, we propose to combine these two attention mechanisms in our work and form a Dual-path Attention (DA) block.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Dec 16, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Weighted Cluster-Range Loss and Criticality-Enhancement Loss for Speaker Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

3D mineral prospectivity modeling in the Sanshandao goldfield, China using the convolutional neural network with attention mechanism
Zhankun Liu ... Xiancheng Mao
Ore Geology Reviews | VOL. 164
Zhankun Liu, et. al.Zhankun Liu ... Xiancheng Mao
31 Dec 2024
Ore Geology Reviews | VOL. 164

Attentive deep CNN for speaker verification
Yongbin Yu ... Nyima Tashi
-
Yongbin Yu, et. al.Yongbin Yu ... Nyima Tashi
20 Jan 2021
20 Jan 2021

Comparison of Attention Mechanism in Convolutional Neural Networks for Binary Classification of Breast Cancer Histopathological Images
Marcin Ziąber ... Wojciech Rudnicki
-
Marcin Ziąber, et. al.Marcin Ziąber ... Wojciech Rudnicki
01 Jan 2023
01 Jan 2023

Membrane fouling diagnosis of membrane components based on multi-feature information fusion
Yaoke Shi ... Long Li
Journal of Membrane Science | VOL. 657
Yaoke Shi, et. al.Yaoke Shi ... Long Li
26 May 2022
Journal of Membrane Science | VOL. 657

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Weighted Cluster-Range Loss and Criticality-Enhancement Loss for Speaker Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences