Selective back-off smoothing for incorporating grammatical constraints into the n-gram language model

Tomoyosi Akiba,Tetsuya Ishikawa,Atsushi Fujii,Katunobu Itou

doi:10.21437/icslp.2002-298

Abstract

ABSTRACTSpoken queries submitted to question answering systemsusually consist of query contents (e.g. about newspaper ar-ticles) and frozen patterns (e.g. WH-words), which can bemodeled with N-gram models and grammar-based models,respectively. We propose a method to integrate those differ-ent types of models into a single N-gram model. We rep-resent the two types of language models in a single wordnetwork. However, common smoothing methods, whichare effective for N-gram models, decrease grammatical con-straints for frozen patterns. For this problem, we proposea selective back-off smoothing method, which controls adegree to which smoothing is applied depending the net-work fragment. Additionally, resulting models are compati-ble with the conventional back-off N-gram models, and thusexisting N-gram decoders can easily be used. We show theeffectiveness of our method by way of experiments.1. INTRODUCTIONThe N-gram model has been used successfully as a languagemodel for large vocabulary continuous speech recognition(LVCSR) systems. The N-gram model is simple but ro-bust enough to model all word sequences in the vocabulary.However, it needs a large training corpus and such a corpuscannot be easily constructed unless there already exists alarge text corpus based on, for example, newspaper articles.On the other hand, the grammar-based model is used as alanguage model for tasks involving a relatively small vocab-ulary. This model does not need a training corpus because ittakes advantage of linguistic knowledge. It can model corre-lations more distant than is possible with the N-gram model,which can model only local relations between words.Thus, some spoken sentences can be modeled more suit-ably by the N-gram models and some more suitably bythe grammar-based model. This is also true from an intra-sentence perspective – some parts of sentence are best mod-eled by N-gram model and some parts are best modeled bya grammar-based model.For example, question answering systems receive queriesthat often consist of a part that conveys various query con-tents about, for example, newspaper articles, and a part thatrepresents a frozen pattern for query sentences. The ﬁrstpart seems to be best dealt with by using an N-gram modeltrained with a corpus of newspaper articles, and the secondpart is best dealt with by using the grammar-based model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Selective back-off smoothing for incorporating grammatical constraints into the n-gram language model

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A language model for highly inflective non-agglutinative languages
Stevan Ostrogonac ... Vlado Delic
-
Stevan Ostrogonac, et. al.Stevan Ostrogonac ... Vlado Delic
01 Sep 2012
01 Sep 2012

An empirical study of statistical language models: n-gram language models vs. neural network language models
Freha Mezzoudj ... Abdelkader Benyettou
International Journal of Innovative Computing and Applications | VOL. 9
Freha Mezzoudj, et. al.Freha Mezzoudj ... Abdelkader Benyettou
01 Jan 2018
International Journal of Innovative Computing and Applications | VOL. 9

Trends and challenges in language modeling for speech recognition and machine translation
Holger Schwenk
-
Holger SchwenkHolger Schwenk
01 Dec 2009
01 Dec 2009

Federated Learning of N-Gram Language Models
Mingqing Chen ... Adeline Wong
-
Mingqing Chen, et. al.Mingqing Chen ... Adeline Wong
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Selective back-off smoothing for incorporating grammatical constraints into the n-gram language model

Abstract

Talk to us

Similar Papers