Abstract

It is well-known that the performance of Gaussian mixture model-based text-independent speaker identification systems deteriorates significantly with the presence of noise and spectral distortion in the training and testing utterances. In this paper, we propose a novel GMM-based speaker identification system based on two robust-statistics estimation methods: the minimum volume ellipsoid method, and the minimum covariance determinant method. Compared to the traditional maximum likelihood estimation method, the proposed methods are less sensitive to outliers in the feature-vector space caused by additive noise and spectral distortion. Moreover, in the testing phase, we propose a simple distance metric to be used for comparing the unknown testing utterance against the speakers’ models. Furthermore, we derive a more robust version of the i-vector extractor, named robust i-vector, which utilizes our proposed robust estimation methods for estimating the parameters of the base universal background model. The proposed classification system has been applied to the NIST 2000 speaker recognition evaluation and the COSINE database. It has also been compared against state-of-the-art techniques such as the GMM/UBM method, the super-vectors method, and the i-vector methods. Experimental results show that the proposed classification system provides up to 16% relative improvement in the identification performance over the i-vector methods for short utterances in the NIST 2000 database and up to 8% when the utterances of the NIST 2000 database are contaminated by different types of artificial noise for signal-to-noise ratio ranging from 0 to 20 dB. For the COSINE database, the robust i-vector estimation provides an absolute improvement of up to 8%. Finally, the real time factor of the proposed distance metric for testing is 55% higher than the RT of the regular ML scoring.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call