Abstract

The goal of statistical language modeling (SLM) is to estimate the likelihood (or probability) of a word string. SLM is fundamental to many natural language applications like automatic speech recognition (ASR) [Jelinek 1990], statistical machine translation (SMT) [Brown et al. 1993], and Asian language text input [Gao et al. 2002a]. The research on SLM basically involves two main tasks: modeling and estimation. Modeling is to determine the structure of a statistical model; estimation is to determine the free parameters of the model using training data. SLM usually uses a parametric model with Maximum Likelihood Estimation (MLE) and various smoothing methods to tackle data sparseness problems. Different statistical models have been proposed in the past, but n-gram models (in particular, bigram and trigram models) still dominate SLM research. SLM has recently been demonstrated as an effective framework for a few new applications, such as question answering [Berger 2001], text summarization, paraphrasing [Barzilay and Lee 2004], and information retrieval [Croft and Lafferty 2003]. However, these new applications come with new challenges. For example, in the SLM approaches to information retrieval, a language model has to be trained on a single document, an extremely small training set; while in ASR, a language model is typically trained on a million word corpus. The recent development of related techniques stimulates new modeling and estimation methods that are beyond the scope of the traditional approaches. Two representative examples of such techniques are statistical parsing and discriminative training. With the ever-increasing popularity of SLM, we think that it is the right time to assemble a special issue reflecting recent advances in both its theory and applications. It __________________________________________________________________________________________

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.