Speaker Recognition using Gaussian Mixture Models

Juozas Kamarauskas

doi:10.5755/j01.eee.85.5.11155

Abstract

Gaussian Mixture models is one of the most popular statistical methods in speaker recognition. The purpose of this research is to perform experiments of speaker recognition using various feature vectors: four formants, four formants with fundamental frequency and mel cepstrum coefficients. Gaussian mixture models using mel cepstrum coefficients is baseline in speaker recognition and gives one of the best results in text independent speaker recognition. After implementing experiments of speaker recognition and comparing experimental results we can affirm that mel scale cepstral coefficients and four formants with fundamental frequency gives quite the same recognition accuracy, but creating of Gaussian mixture speaker models and recognition process continues a few times longer using mel scale cepstral coefficients, because count of calculations is few times greater in that case. Using only four formants gives the worst results of recognition accuracy. Ill. 7, bibl. 12 (in English; summaries in English, Russian and Lithuanian).

Full Text