Abstract

To investigate the robustness of a small vocabulary, speaker-independent word recognizer trained on normal speech, testing was done using Lombard speech. The Lombard database was collected from talkers exposed to 85 dB SPL white Gaussian noise played through headphones. Recognition tests utilized a hidden Markov model recognizer with separate codebooks for static and dynamic cepstral features [Gupta et al., in Proc. IEEE ICASSP-87, 697–700 (1987)]. The front-end analysis used either standard linear prediction (LP) or a version of perceptually based linear prediction. (This PLP has filtering functions with 3-dB bandwidths of 1 Bark, a correction to Eq. (3) in Hermansky et al. [Proc. IEEE ICASSP-85, 509–512 (1985)].) The effects of the analysis technique, analysis order, distance measure (cepstral or weighted cepstral), and spectral features (static, dynamic, or integrated) were studied. The utility of static and dynamic features differed greatly according to the analysis method and distance measure. For example, for low (5th)-order PLP, the dynamic features alone performed much better than either static or integrated features. Alternatively, for high (14th)-order LPC with weighted cepstral distance, the dynamical features alone performed much worse than either static or integrated features, yet contributed to improved recognition rates with integrated features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call