Abstract
This paper presents a comparative analysis of text-independent speaker identification techniques with Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) separately. As feature vector, it uses Mel-frequency cepstral coefficients (MFCC), its delta derivatives ($\triangle $MFCC) and double delta derivatives ($\triangle \triangle $MFCC). The aim is to test the accuracy of the proposed models for different sizes of MFCC feature vector on fine-tuning GMM and SVM models. The proposed experimental setup with a frame overlap of 75% and MFCC feature size of 20-MFCC+$20-\triangle $MFCC+$20-\triangle \triangle $MFCC coefficients perform better on SVM than GMM. On prepared data set, the utmost accuracy of the model built using SVM is 100% whereas that of GMM is 95.74%. The data set is prepared using voice samples of native speakers taken from Voxforge website and personal survey. The diversity in the speech corpus shows that the method performs equally well irrespective of language, gender, age, and region.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.