Abstract

The main objective of this paper is to explore the effectiveness of perceptual features combined with pitch for text independent speaker recognition. In this algorithm, these features are captured and Gaussian mixture models are developed representing L feature vectors of speech for every speaker. Speakers are identified based on first finding posteriori probability density function between mixtures of speaker models and test speech vectors. Speakers are classified based on maximum probability density function which corresponds to a speaker model. This algorithm gives the good overall accuracy of 98% for mel frequency perceptual linear predictive cepstrum combined with pitch for identifying speaker among 8 speakers chosen randomly from 8 different dialect regions in “TIMIT” database by considering GMM speaker models of 12 mixtures. It also gives the better average accuracy of 95.75% for the same feature with respect to 8 speakers chosen randomly from the same dialect region for12 mixtures GMM speaker models. Mel frequency linear predictive cepstrum gives the better accuracy of 96.75% and 96.125% for GMM speaker models of 16 mixtures by considering speakers from different dialect regions and from same dialect region respectively. This algorithm is also evaluated for 4, 8 and 32 mixtures GMM speaker models. 12 mixtures GMM speaker models are tested for population of 20 speakers and the accuracy is found to be slightly less as compared to that for the the speaker population of 8 speakers. The noteworthy feature of speaker identification algorithm is to evaluate the testing procedure on identical messages for all the speakers. This work is extended to speaker verification whose performance is measured in terms of % False rejection rate, % False acceptance rate and % Equal error rate. % False acceptance rate and % Equal error rate are found to be less for mel frequency perceptual linear predictive cepstrum with pitch and % false rejection rate is less for mel frequency linear predictive cepstrum. In this work, F-ratio is computed as a theoretical measure on the features of the training speeches to validate the experimental results for perceptual features with pitch. χ 2 distribution tool is used to perform the statistical justification of good experimental results for all the features with respect to both speaker identification and verification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call