Abstract
This paper proposes a novel Language Model (LM) adaptation method based on Minimum Discrimination Information (MDI). In the proposed method, a background LM is viewed as a discrete distribution and an adapted LM is built to be as close as possible to the background LM, while satisfying unigram constraint. This is due to the fact that there is a limited amount of domain corpus available for the adaptation of a natural language-based intelligent personal assistant system. Two unigram constraint estimation methods are proposed: one based on word frequency in the domain corpus, and one based on word similarity estimated from WordNet. In terms of the adapted LM's perplexity using word frequency in tiny domain corpora (ranging from 30~120 seconds in length) the relative performance improvements are measured at 13.9%~16.6%. Further relative performance improvements (1.5%~2.4%) are observed when WordNet is used to generate word similarities. These successes express an efficient ways for re-scaling and normalizing the conditional distribution, which uses an interpolation-based LM.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.