Abstract
This work compares the performance of the Mel-Frequency Cepstral Coefficient (MFCC) and Perceptual Linear Prediction (PLP) features for developing a text-dependent speaker identification system. Continuously spoken Hindi speech sentences have been used to train the HMM models using HTK toolkit for each speaker separately. The experiments have been performed using a set of 200 continuously spoken sentences with vocabulary of 20000 isolated words using a database of 100 speakers. The results show an accuracy of 92.26% recognition when PLP features have been used and accuracy of 91.18% for MFCC features. A confusion matrix has been created for all the 20 test speakers based on the recognition scores obtained for each of these speakers and their confusion with other speakers. Performance has been compared in the closed set and open set conditions of testing and as it is expected, the performance in the closed set condition is far better than the open set. We propose that if PLP features are used in place of MFCC, they may provide improvement in speaker identification accuracy by reducing the cases of false acceptance.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have