SEC-GAN for robust speaker recognition with emotional state dismatch

Dongdong Li,Zhuo Yang,Zhe Wang,Ming Hua

doi:10.1016/j.bspc.2023.105039

Abstract

Speaker recognition is often dependent on the speaker and susceptible to emotional factors, thus mostly decreasing recognition performance, we propose a framework by combining generative adversarial networks and speaker recognition for generating additional speaker-related emotional training speech feature to enhance robustness under different emotional conditions. In this framework, a new speaker emotion-converted generative adversarial network (SEC-GAN) is developed for speaker recognition. Given the neutral speech of the target speaker, SEC-GAN learns speech information to generate speech feature in other emotions based on neutral speech while retaining speaker identity. In addition, a new loss function is designed to retain the speaker’s internal information during feature reconstruction, and an emotion discriminator is introduced to classify the speech feature’s emotion for better emotion generation quality. Based on the origin neutral and generated training data from native speakers of Mandarin Affective Speech Corpus (MASC), the negative impact of emotion mismatch between speech can be decreased by using our framework. This strategy could solve the common problem in reality that most voice control devices enroll user’s calm speech but fail to recognize user’s identity when they are in other emotion. The experimental results on MASC which is 57.59% show the improvement of 8.27% compared with VGG baseline and 5.62% compared with x-vector in accuracy. Our framework also outperforms the existing state-of-the-art method ECAPA-TDNN and other comparison methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SEC-GAN for robust speaker recognition with emotional state dismatch

Abstract

Talk to us

Similar Papers

More From: Biomedical Signal Processing and Control

Lead the way for us

Journal: Biomedical Signal Processing and Control	Publication Date: May 20, 2023
Citations: 1

Similar Papers

Use of Emotional and Neutral Speech in Evaluating Compression Speeds.
Christopher Slugocki ... Francis Kuk
Journal of the American Academy of Audiology | VOL. 32
Christopher Slugocki, et. al.Christopher Slugocki ... Francis Kuk
01 Apr 2021
Journal of the American Academy of Audiology | VOL. 32

Text-independent Speaker Recognition Based on X-vector
Lianyu Zhou ... Mingjiang Wang
-
Lianyu Zhou, et. al.Lianyu Zhou ... Mingjiang Wang
20 Jul 2022
20 Jul 2022

Speaker segmentation and verification
Haishan Zhong
-
Haishan ZhongHaishan Zhong
03 Oct 2019
03 Oct 2019

Generative Adversarial Network with Convolutional Wavelet Packet Transforms for Automated Speaker Recognition and Classification
Et Al Venkata Subba Reddy Gade
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11
Et Al Venkata Subba Reddy GadeEt Al Venkata Subba Reddy Gade
05 Nov 2023
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SEC-GAN for robust speaker recognition with emotional state dismatch

Abstract

Talk to us

Similar Papers

More From: Biomedical Signal Processing and Control