EXPLOITING SYNTACTIC, SEMANTIC, AND LEXICAL REGULARITIES IN LANGUAGE MODELING VIA DIRECTED MARKOV RANDOM FIELDS

Shaojun Wang,Russell Greiner,Dale Schuurmans,Shaomin Wang,Li Cheng

doi:10.1111/j.1467-8640.2012.00436.x

Abstract

We present a directed Markov random field (MRF) model that combinesn‐gram models, probabilistic context‐free grammars (PCFGs), and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. Even though the composite directed MRF model potentially has an exponential number of loops and becomes a context‐sensitive grammar, we are nevertheless able to estimate its parameters in cubic time using an efficient modified Expectation‐Maximization (EM) method,the generalized inside–outside algorithm, which extends the inside–outside algorithm to incorporate the effects of then‐gram and PLSA language models. We generalize various smoothing techniques to alleviate the sparseness ofn‐gram counts in cases where there are hidden variables. We also derive an analogous algorithm to find the most likely parse of a sentence and to calculate the probability of initial subsequence of a sentence, all generated by the composite language model. Our experimental results on theWall Street Journalcorpus show that we obtain significant reductions in perplexity compared to the state‐of‐the‐art baseline trigram model with Good–Turing and Kneser–Ney smoothing techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

EXPLOITING SYNTACTIC, SEMANTIC, AND LEXICAL REGULARITIES IN LANGUAGE MODELING VIA DIRECTED MARKOV RANDOM FIELDS

Abstract

Talk to us

Similar Papers

More From: Computational Intelligence

Lead the way for us

Journal: Computational Intelligence	Publication Date: Jul 10, 2012
Citations: 2

Similar Papers

Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields
Shaojun Wang ... Dale Schuurmans
-
Shaojun Wang, et. al.Shaojun Wang ... Dale Schuurmans
01 Jan 2004
01 Jan 2004

Stochastic Analysis of Lexical and Semantic Enhanced Structural Language Model
Shaojun Wang ... Li Cheng
-
Shaojun Wang, et. al.Shaojun Wang ... Li Cheng
01 Jan 2006
01 Jan 2006

Combining Statistical Language Models via the Latent Maximum Entropy Principle
Shaojun Wang ... Fuchun Peng
Machine Learning | VOL. 60
Shaojun Wang, et. al.Shaojun Wang ... Fuchun Peng
02 Jun 2005
Machine Learning | VOL. 60

A Region-based Coupled MRF Model for Coarse Image Region Segmentation toward its VLSI Implementation
Haichao Liang ... Kenji Matsuzaka
IEEJ Transactions on Electronics, Information and Systems | VOL. 131
Haichao Liang, et. al.Haichao Liang ... Kenji Matsuzaka
01 Jan 2010
IEEJ Transactions on Electronics, Information and Systems | VOL. 131

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EXPLOITING SYNTACTIC, SEMANTIC, AND LEXICAL REGULARITIES IN LANGUAGE MODELING VIA DIRECTED MARKOV RANDOM FIELDS

Abstract

Talk to us

Similar Papers

More From: Computational Intelligence