Abstract

This paper presents a comparative analysis of text-independent speaker identification techniques with Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) separately. As feature vector, it uses Mel-frequency cepstral coefficients (MFCC), its delta derivatives ($\triangle $MFCC) and double delta derivatives ($\triangle \triangle $MFCC). The aim is to test the accuracy of the proposed models for different sizes of MFCC feature vector on fine-tuning GMM and SVM models. The proposed experimental setup with a frame overlap of 75% and MFCC feature size of 20-MFCC+$20-\triangle $MFCC+$20-\triangle \triangle $MFCC coefficients perform better on SVM than GMM. On prepared data set, the utmost accuracy of the model built using SVM is 100% whereas that of GMM is 95.74%. The data set is prepared using voice samples of native speakers taken from Voxforge website and personal survey. The diversity in the speech corpus shows that the method performs equally well irrespective of language, gender, age, and region.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call