Abstract

In this work the linear prediction (LP) residual is processed in spectral and cepstral domains to model the speaker-specific excitation information. In the spectral domain, the excitation energy information is modeled from subband energies (SBE). The excitation periodicity information is modeled by power differences of spectrum in subband (PDSS) measure. This work carries some refinements in the existing methods of extracting SBE and PDSS by exploiting the nature of the excitation spectrum. The SBE and PDSS values are computed from mel warped residual subband spectrum and called as residual mel subband energies (R-MSE) and mel power differences of subband spectra (M-PDSS), respectively. The different speaker recognition studies performed using NIST-99 and NIST-03 databases demonstrate that R-MSE and M-PDSS features represent good speaker information. It is also demonstrated that the excitation energy information can be better modeled in the cepstral domain by residual mel frequency cepstral coefficients (R-MFCC). Furhter, the evidences provided by M-PDSS and R-MFCC features are different and combine well and provides improved recognition performance. The combined evidence from M-PDSS and R-MFCC together with the vocal tract information further improves the performance. Finally, a comparative study on processing the LP residual in temporal, spectral and cepstral domains demonstrates that with a small compromise with the recognition performance, processing LP residual in spectral and cepstral domains provide compact and effective way of representing the excitation information, as compared to temporal processing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.