Abstract

In this paper, we provide a comparative study of spectral front-end features used as representations for speech signals by processing multitaper magnitude and phase spectra, for speaker verification with expressive speech. In particular, the multitaper modified group delay function (MT-MOGDF) and multitaper magnitude (MT-MAG) spectra of the speech signals are employed to obtain low variance estimates of speech spectra. We observe that the cues that aid in representation of expressive speech are evident in the MT-MOGDF spectrum than the MT-MAG spectrum in terms of mean Formant value and Formant bandwidth. Our extensive experimental study on a speaker verification system with a Gaussian mixture model based universal background model classifier on expressive speech using the IITKGP-SESC and EMODB databases show that MT-MOGDF performs better than MT-MAG technique, in terms of equal error rate and minimum decision cost function. This improvement due to MT-MOGDF is owed to a better representation and a low-variance estimate of the speech spectrum. Our results highlight the utility of MT-MOGDF as a potential alternative for MT-MAG representation for speaker verification problems in general.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call