Abstract

This paper describes an accurate and efficient algorithm for very-large-vocabulary continuous speech recognition. It is based on a two-stage LR parser with hidden Markov models (HMMs) as phoneme models. To improve recognition accuracy, it uses the forward and backward trellis likehood. To improve search efficiency, it uses adjusting windows and merges candidates that have the same allophonic phoneme sequences and grammatical state, and then merges candidates at the meaning level. This algorithm was applied to a telephone directory assistance system that contains more than 70,000 subscribers (about 80,000 words) to evaluate its speaker-independent speech recognition capabilities. For eight speakers, the algorithm achieved a speech understanding rate of 65% for spontaneous speech. The results show that the system performs well in spite of the large word perplexity. This paper also describes a multi-modal dialog system that uses our large-vocabulary speech recognition algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.