Fast exact maximum likelihood estimation for mixture of language model

Yi Zhang,Wei Xu

doi:10.1016/j.ipm.2007.12.003

Abstract

Language modeling is an effective and theoretically attractive probabilistic framework for text information retrieval. The basic idea of this approach is to estimate a language model of a given document (or document set), and then do retrieval or classification based on this model. A common language modeling approach assumes the data D is generated from a mixture of several language models. The core problem is to find the maximum likelihood estimation of one language model mixture, given the fixed mixture weights and the other language model mixture. The EM algorithm is usually used to find the solution. In this paper, we proof that an exact maximum likelihood estimation of the unknown mixture component exists and can be calculated using the new algorithm we proposed. We further improve the algorithm and provide an efficient algorithm of O ( k ) complexity to find the exact solution, where k is the number of words occurring at least once in data D. Furthermore, we proof the probabilities of many words are exactly zeros, and the MLE estimation is implemented as a feature selection technique explicitly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fast exact maximum likelihood estimation for mixture of language model

Abstract

Talk to us

Similar Papers

More From: Information Processing & Management

Lead the way for us

Journal: Information Processing & Management	Publication Date: Jan 29, 2008
Citations: 4

Similar Papers

Fast exact maximum likelihood estimation for mixture of language models
Yi Zhang ... Wei Xu
-
Yi Zhang, et. al.Yi Zhang ... Wei Xu
23 Jul 2007
23 Jul 2007

Adjusted Maximum Likelihood and Pseudo-Likelihood Estimation for Noisy Gaussian Markov Random Fields
Ian Dryden ... Luca Romagnoli
Journal of Computational and Graphical Statistics | VOL. 11
Ian Dryden, et. al.Ian Dryden ... Luca Romagnoli
01 Jun 2002
Journal of Computational and Graphical Statistics | VOL. 11

Statistical approaches to estimating mean water quality concentrations with detection limits.
Robert H Shumway ... Rahman S Azari
Environmental Science & Technology | VOL. 36
Robert H Shumway, et. al.Robert H Shumway ... Rahman S Azari
19 Jun 2002
Environmental Science & Technology | VOL. 36

Estimation in the piece-wise constant hazard rate model-when the data are grouped
Thomas J Boardman ... Robert E Colvert
Communications in Statistics - Theory and Methods | VOL. 8
Thomas J Boardman, et. al.Thomas J Boardman ... Robert E Colvert
01 Jan 1979
Communications in Statistics - Theory and Methods | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast exact maximum likelihood estimation for mixture of language model

Abstract

Talk to us

Similar Papers

More From: Information Processing &amp; Management

More From: Information Processing & Management