Abstract

An efficient, scalable speech recognition architecture is proposed for multidomain dialog systems by combining topic detection and topic-dependent language modeling. The inferred domain is automatically detected from the user's utterance, and speech recognition is then performed with an appropriate domain-dependent language model. The architecture improves accuracy and efficiency over current approaches and is scaleable to a large number of domains. In this paper, a novel framework using a multilayer hierarchy of language models is introduced in order to improve robustness against topic detection errors. The proposed system provides a relative reduction in WER of 10.5% over a single language model system. Furthermore it achieves an accuracy that is comparable to using multiple language models in parallel while using only a fraction of the computational cost.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call