Trans-dimensional Random Fields for Language Modeling

Bin Wang,Zhiqiang Tan,Zhijian Ou

doi:10.3115/v1/p15-1076

Abstract

Language modeling (LM) involves determining the joint probability of words in a sentence. The conditional approach is dominant, representing the joint probability in terms of conditionals. Examples include n-gram LMs and neural network LMs. An alternative approach, called the random field (RF) approach, is used in whole-sentence maximum entropy (WSME) LMs. Although the RF approach has potential benefits, the empirical results of previous WSME models are not satisfactory. In this paper, we revisit the RF approach for language modeling, with a number of innovations. We propose a trans-dimensional RF (TDRF) model and develop a training algorithm using joint stochastic approximation and trans-dimensional mixture sampling. We perform speech recognition experiments on Wall Street Journal data, and find that our TDRF models lead to performances as good as the recurrent neural network LMs but are computationally more efficient in computing sentence probability.

Highlights

Language modeling is crucial for a variety of computational linguistic applications, such as speech recognition, machine translation, handwriting recognition, information retrieval and so on
We explore the use of a variety of features based on word and class information in trans-dimensional random field (TDRF) Language modeling (LM)
We describe a trans-dimensional mixture sampling algorithm to simulate from the joint distribution p(l, xl; λ, ζ), which is used with (λ, ζ) = (λ(t−1), ζ(t−1)) at time t for Markov chain Monte Carlo (MCMC) sampling in the joint stochastic approximation (SA) algorithm

Summary

Introduction

Language modeling is crucial for a variety of computational linguistic applications, such as speech recognition, machine translation, handwriting recognition, information retrieval and so on. It involves determining the joint probability p(x) of a sentence x, which can be denoted as a pair x = (l, xl), where l is the length and xl = Neural network LMs, which have begun to surpass the traditional n-gram LMs, follow the conditional modeling approach, with φ(hi) determined by a neural network (NN), which can be either a feedforward NN (Schwenk, 2007) or a recurrent NN (Mikolov et al, 2011)

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Trans-dimensional Random Fields for Language Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2015
Citations: 36	License type: cc-by

Similar Papers

Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling
Ryo Masumura ... Hirokazu Masataki
-
Ryo Masumura, et. al.Ryo Masumura ... Hirokazu Masataki
01 Dec 2017
01 Dec 2017

On efficient training of word classes and their application to recurrent neural network language models
Rami Botros ... Kazuki Irie
-
Rami Botros, et. al.Rami Botros ... Kazuki Irie
06 Sep 2015
06 Sep 2015

Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models
Yi-Chao Wu ... Cheng-Lin Liu
Pattern Recognition | VOL. 65
Yi-Chao Wu, et. al.Yi-Chao Wu ... Cheng-Lin Liu
29 Dec 2016
Pattern Recognition | VOL. 65

Investigation of back-off based interpolation between recurrent neural network and n-gram language models
X Chen ... P C Woodland
-
X Chen, et. al.X Chen ... P C Woodland
01 Dec 2015
01 Dec 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Trans-dimensional Random Fields for Language Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers