Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification

Ming Li,Shrikanth Narayanan

doi:10.1016/j.csl.2014.02.004

Abstract

This paper presents a simplified and supervised i-vector modeling approach with applications to robust and efficient language identification and speaker verification. First, by concatenating the label vector and the linear regression matrix at the end of the mean supervector and the i-vector factor loading matrix, respectively, the traditional i-vectors are extended to label-regularized supervised i-vectors. These supervised i-vectors are optimized to not only reconstruct the mean supervectors well but also minimize the mean square error between the original and the reconstructed label vectors to make the supervised i-vectors become more discriminative in terms of the label information. Second, factor analysis (FA) is performed on the pre-normalized centered GMM first order statistics supervector to ensure each gaussian component's statistics sub-vector is treated equally in the FA, which reduces the computational cost by a factor of 25 in the simplified i-vector framework. Third, since the entire matrix inversion term in the simplified i-vector extraction only depends on one single variable (total frame number), we make a global table of the resulting matrices against the frame numbers’ log values. Using this lookup table, each utterance's simplified i-vector extraction is further sped up by a factor of 4 and suffers only a small quantization error. Finally, the simplified version of the supervised i-vector modeling is proposed to enhance both the robustness and efficiency. The proposed methods are evaluated on the DARPA RATS dev2 task, the NIST LRE 2007 general task and the NIST SRE 2010 female condition 5 task for noisy channel language identification, clean channel language identification and clean channel speaker verification, respectively. For language identification on the DARPA RATS, the simplified supervised i-vector modeling achieved 2%, 16%, and 7% relative equal error rate (EER) reduction on three different feature sets and sped up by a factor of more than 100 against the baseline i-vector method for the 120s task. Similar results were observed on the NIST LRE 2007 30s task with 7% relative average cost reduction. Results also show that the use of Gammatone frequency cepstral coefficients, Mel-frequency cepstral coefficients and spectro-temporal Gabor features in conjunction with shifted-delta-cepstral features improves the overall language identification performance significantly. For speaker verification, the proposed supervised i-vector approach outperforms the i-vector baseline by relatively 12% and 7% in terms of EER and norm old minDCF values, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Mar 12, 2014
Citations: 51

Similar Papers

Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
Joyanta Basu ... Tapan Kumar Basu
Circuits, Systems, and Signal Processing | VOL. 40
Joyanta Basu, et. al.Joyanta Basu ... Tapan Kumar Basu
20 Apr 2021
Circuits, Systems, and Signal Processing | VOL. 40

Speaker verification using simplified and supervised i-vector modeling
Ming Li ... Shrikanth S Narayanan
-
Ming Li, et. al.Ming Li ... Shrikanth S Narayanan
01 May 2013
01 May 2013

Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter
Paresh M Chauhan ... Nikita P Desai
-
Paresh M Chauhan, et. al.Paresh M Chauhan ... Nikita P Desai
01 Mar 2014
01 Mar 2014

Robust language and speaker identification using image processing techniques combined with PCA
Deepak Joshi ... Shiv Dutt Joshi
-
Deepak Joshi, et. al.Deepak Joshi ... Shiv Dutt Joshi
01 Dec 2013
01 Dec 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language