Abstract

Endpoint detection is a critical issue for several types of isolated utterance recognizers, because improper endpoints often result in recognition errors. Endpoint errors often stem from nonspeech artifacts, namely lip smacks, tongue and teeth clicks, and breath noise. Endpoint detectors based only on energy thresholds cannot correctly reject these artifacts, but adding a word model allows most of these artifacts to be properly rejected. The rules which implement the word model are (1) the word cannot begin or end with two released plosives, (2) word initial stop gaps are less than 120 ms and word final ones less than 200 ms, (3) a word must contain a vocalic nucleus and be at least 100 ms in length, (4) word final sounds containing only mid‐frequency energy are breath noise. The detection algorithm has been implemented on a Heuristics Speech Recognizer and tested using the Texas Instruments isolated word data base. The word model based system substantially reduced the error rate relative to an energy threshold based system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.