Abstract

Within the context of automatic speech recognition (ASR) applications for telephony, we investigate the acoustic preprocessing issues that are at stake in going from the fixed line to the cellular network. Because the spectral representation used in enhanced full rate GSM is linear prediction, we investigate the relative advantages and drawbacks of conventional mel-frequency cepstral coefficient (MFCC) parameters derived from a non-parametric fast Fourier transform (FFT) and MFCC parameters derived from a linear predictive coding (LPC) spectral estimate. Robust formant parameters, also derived from an LPC description of the spectrum, are studied as an alternative to MFCCs. Within the framework of connected digit recognition based on hidden Markov models, ASR performance was measured for clean conditions, as well as for three different additive noise conditions. In addition, the performance of a conventional recognition procedure was compared with the performance of an ASR system based on our acoustic backing-off implementation of missing feature theory (MFT).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call