Processing of linear prediction residual in spectral and cepstral domains for speaker information

Debadatta Pati,S R Mahadeva Prasanna

doi:10.1007/s10772-015-9273-9

Abstract

In this work the linear prediction (LP) residual is processed in spectral and cepstral domains to model the speaker-specific excitation information. In the spectral domain, the excitation energy information is modeled from subband energies (SBE). The excitation periodicity information is modeled by power differences of spectrum in subband (PDSS) measure. This work carries some refinements in the existing methods of extracting SBE and PDSS by exploiting the nature of the excitation spectrum. The SBE and PDSS values are computed from mel warped residual subband spectrum and called as residual mel subband energies (R-MSE) and mel power differences of subband spectra (M-PDSS), respectively. The different speaker recognition studies performed using NIST-99 and NIST-03 databases demonstrate that R-MSE and M-PDSS features represent good speaker information. It is also demonstrated that the excitation energy information can be better modeled in the cepstral domain by residual mel frequency cepstral coefficients (R-MFCC). Furhter, the evidences provided by M-PDSS and R-MFCC features are different and combine well and provides improved recognition performance. The combined evidence from M-PDSS and R-MFCC together with the vocal tract information further improves the performance. Finally, a comparative study on processing the LP residual in temporal, spectral and cepstral domains demonstrates that with a small compromise with the recognition performance, processing LP residual in spectral and cepstral domains provide compact and effective way of representing the excitation information, as compared to temporal processing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Abstract

Talk to us

Similar Papers

More From: International Journal of Speech Technology

Lead the way for us

Journal: International Journal of Speech Technology	Publication Date: Feb 24, 2015
Citations: 5

Similar Papers

Speech intelligibility assessment of dysarthria using Fisher vector encoding
Chandrashekar H․M․ ... N Sreedevi
Computer Speech & Language | VOL. 77
Chandrashekar H․M․, et. al.Chandrashekar H․M․ ... N Sreedevi
08 Jun 2022
Computer Speech & Language | VOL. 77

Speaker information from subband energies of Linear Prediction residual
Debadatta Pati ... S R M Prasanna
-
Debadatta Pati, et. al.Debadatta Pati ... S R M Prasanna
01 Jan 2009
01 Jan 2009

Combining evidences from Hilbert envelope and residual phase for detecting replay attacks
Madhusudan Singh ... Debadatta Pati
International Journal of Speech Technology | VOL. 22
Madhusudan Singh, et. al.Madhusudan Singh ... Debadatta Pati
04 Mar 2019
International Journal of Speech Technology | VOL. 22

A MULTI-DOMAIN HYBRID METHOD FOR HEAD-ON COLLISION OF BLACK HOLES IN PARTICLE LIMIT
Debananda Chakraborty ... Jae-Hun Jung
International Journal of Modern Physics C | VOL. 22
Debananda Chakraborty, et. al.Debananda Chakraborty ... Jae-Hun Jung
01 May 2011
International Journal of Modern Physics C | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Abstract

Talk to us

Similar Papers

More From: International Journal of Speech Technology