SpeakerGAN: Speaker identification with conditional generative adversarial network

Liyang Chen,Yifeng Liu,Wendong Xiao,Yingxue Wang,Haiyong Xie

doi:10.1016/j.neucom.2020.08.040

Abstract

Current methods based on the traditional i-vectors and deep neural network (DNN) have shown effectiveness on the speaker identification task, especially with the corpus of large scale. However, when the size of the training dataset is small, the overfitting problem may happen and lead to performance degradation. Besides, the robust identification still remains a challenging problem even under the less strict requirements. This paper proposes a novel approach, SpeakerGAN, for speaker identification with the conditional generative adversarial network (CGAN). It allows the adversarial networks for distinguishing real/fake samples and predicting class labels simultaneously. We configure the generator and the discriminator in SpeakerGAN with the gated convolutional neural network (CNN) and the modified residual network (ResNet) to obtain generated samples of high diversity as well as increase the network capacity. The multiple loss functions are combined and optimized to encourage the correct mapping and accelerate the convergence. Experimental results show that SpeakerGAN reduces the classification error rate by 87% and 16% compared with the traditional i-vector system and the state-of-the-art DNN based method. Under the scenario of limited training data, SpeakerGAN obtains significant improvement over the baselines. In the case of taking 1.6 s of each speaker for testing, SpeakerGAN achieves the identification accuracy of 98.20%, which suggests the promise for short-utterance speaker identification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SpeakerGAN: Speaker identification with conditional generative adversarial network

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Sep 1, 2020
Citations: 22

Similar Papers

Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
Alicia Lozano-Diez ... Javier Gonzalez-Dominguez
-
Alicia Lozano-Diez, et. al.Alicia Lozano-Diez ... Javier Gonzalez-Dominguez
21 Nov 2018
21 Nov 2018

Designing mm-wave electromagnetic engineered surfaces using generative adversarial networks
Sanaz Mohammadjafari ... Ayse Basar
Neural Computing and Applications | VOL. 33
Sanaz Mohammadjafari, et. al.Sanaz Mohammadjafari ... Ayse Basar
11 Jan 2021
Neural Computing and Applications | VOL. 33

Deep CNNs With Self-Attention for Speaker Identification
Nguyen Nang An ... Yanbing Liu
IEEE Access | VOL. 7
Nguyen Nang An, et. al.Nguyen Nang An ... Yanbing Liu
01 Jan 2019
IEEE Access | VOL. 7

A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients
Daniele Salvati ... Gian Luca Foresti
Expert Systems with Applications | VOL. 222
Daniele Salvati, et. al.Daniele Salvati ... Gian Luca Foresti
09 Mar 2023
Expert Systems with Applications | VOL. 222

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SpeakerGAN: Speaker identification with conditional generative adversarial network

Abstract

Talk to us

Similar Papers

More From: Neurocomputing