Abstract
ABSTRACTSpoken queries submitted to question answering systemsusually consist of query contents (e.g. about newspaper ar-ticles) and frozen patterns (e.g. WH-words), which can bemodeled with N-gram models and grammar-based models,respectively. We propose a method to integrate those differ-ent types of models into a single N-gram model. We rep-resent the two types of language models in a single wordnetwork. However, common smoothing methods, whichare effective for N-gram models, decrease grammatical con-straints for frozen patterns. For this problem, we proposea selective back-off smoothing method, which controls adegree to which smoothing is applied depending the net-work fragment. Additionally, resulting models are compati-ble with the conventional back-off N-gram models, and thusexisting N-gram decoders can easily be used. We show theeffectiveness of our method by way of experiments.1. INTRODUCTIONThe N-gram model has been used successfully as a languagemodel for large vocabulary continuous speech recognition(LVCSR) systems. The N-gram model is simple but ro-bust enough to model all word sequences in the vocabulary.However, it needs a large training corpus and such a corpuscannot be easily constructed unless there already exists alarge text corpus based on, for example, newspaper articles.On the other hand, the grammar-based model is used as alanguage model for tasks involving a relatively small vocab-ulary. This model does not need a training corpus because ittakes advantage of linguistic knowledge. It can model corre-lations more distant than is possible with the N-gram model,which can model only local relations between words.Thus, some spoken sentences can be modeled more suit-ably by the N-gram models and some more suitably bythe grammar-based model. This is also true from an intra-sentence perspective – some parts of sentence are best mod-eled by N-gram model and some parts are best modeled bya grammar-based model.For example, question answering systems receive queriesthat often consist of a part that conveys various query con-tents about, for example, newspaper articles, and a part thatrepresents a frozen pattern for query sentences. The firstpart seems to be best dealt with by using an N-gram modeltrained with a corpus of newspaper articles, and the secondpart is best dealt with by using the grammar-based model.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.