Abstract

The popular n-gram language model (LM) is weak for unfrequent words. Conventional approaches such as class-based LMs pre-define some sharing structures (e.g., word classes) to solve the problem. However, defining such structures requires prior knowledge, and the context sharing based on these structures is generally inaccurate. This paper presents a novel similar word model to enhance unfrequent words. In principle, we enrich the context of an unfrequent word by borrowing context information from some “similar words.” Compared to conventional class-based methods, this new approach offers a fine-grained context sharing by referring to words that best match the target word, and it is more flexible as no sharing structures need to be defined by hand. Experiments on a large-scale Chinese speech recognition task demonstrated that the similar word approach can improve performance on unfrequent words significantly, while keeping the performance on general tasks almost unchanged.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.