Abstract

Accurate detection of the boundaries of a speech utterance during a recording interval has been shown to be crucial for reliable and robust automatic speech recognition. The endpoint detection problem is fairly straightforward for high-level speech signals spoken in low-level stationary noise environments. However, these ideal conditions do not always exist. One example, where reliable word detection is difficult, is speech spoken in a mobile environment. Currently, most endpoint detection algorithms use only signal energy and duration information to perform the endpoint detection task. In this paper, an endpoint detection algorithm is presented that is based on hidden Markov model (HMM) technology. Based on a speaker-dependent speech database from four talkers, and recorded in a mobile environment under five different driving conditions (including while traveling at 60 mph with the fan on), several endpoint detection schemes were tested. The results showed that the HMM-based approach to endpoint detection performed significantly better then the energy-based system. The overall accuracy of the system using the HMM endpoint detector, when tested on the 11 word digits vocabulary (zero through nine and oh) with speech recorded in various mobile environments, was 99.4%. A top recognition performance of 99.7%, on the same conditions, was achieved by using an HMM recognizer with explicit endpointing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call