Issues in feature-based recognition of speech mixed with impulsive sounds

Nabil N. Bitar,Carol Y. Espy-Wilson,S. Hamid Nawab,Ramamurthy Mani

doi:10.1121/1.409418

Abstract

In this study, the feasibility of a knowledge-based approach to speaker-independent speech recognition in the presence of impulsive environmental sounds such as knocks, clinks, and claps is examined. Statistical approaches to speech recognition have had some success in dealing with steady background, probably because they have concentrated on routinely encountered steady background sounds, most of which can be modeled as white or colored noise. However, current statistical approaches are less suited to dealing with environments containing sporadic occurrences of various discrete-event sounds because of (1) the enormous variety of discrete-event sounds and (2) discrete-event sounds can be mixed with the speech signal with different loudness and temporal alignments. In this study, experiments are being performed on feature-based speech recognition using speech sounds (from a database of spoken telephone numbers) mixed with impulsive sounds (from a database of everyday environmental impulsive sounds). An important objective of this research is to determine how different acoustic cues (formants, pitch, frication noise, stop bursts, etc.) are influenced by the presence of different impulsive sounds. [This research was supported by NSF Research Grant No. IRI-9300194.]

Full Text