Integrating meta-information into recurrent neural network language models

Yangyang Shi,Martha Larson,Joris Pelemans,Catholijn M Jonker,Patrick Wambacq,Pascal Wiggers,Kris Demuynck

doi:10.1016/j.specom.2015.06.006

Abstract

Due to their advantages over conventional n-gram language models, recurrent neural network language models (rnnlms) recently have attracted a fair amount of research attention in the speech recognition community. In this paper, we explore one advantage of rnnlms, namely, the ease with which they allow the integration of additional knowledge sources. We concentrate on features that provide complementary information w.r.t. the lexical identities of the words. We refer to such information as meta-information. We single out three cases and investigate their merits by means of N-best list re-scoring experiments on a challenging corpus of spoken Dutch (referred to as cgn) as well as on the English Wall Street Journal (wsj) corpus. First, we look at Parts of Speech (POS) tags and lemmas, two sources of word-level linguistic information that are known to make a contribution to the performance of conventional language models. We confirm that rnnlms can benefit from these sources as well. Second, we investigate socio-situational settings (ssss) and topics, two sources of discourse-level information that are also known to benefit language models. ssss are present in the cgn data, and can be seen as a proxy for the language register. For the purposes of our investigation, we assume that information on the sss can be captured at the moment at which speech is recorded. Topics, i.e., treatments of different subjects, are present in the wsj data. In order to predict POS, lemmas, sss and topic, a second rnnlm is coupled to the main rnnlm. We refer to this architecture as a recurrent neural network tandem language model (rnntlm). Our experimental findings show that if high-quality meta-information labels are available, both word-level and discourse-level information improve performance of language models. Third, we investigate sentence length and word length (i.e., token size), two sources of intrinsic information that are readily available for exploitation because they are known at the time of re-scoring. Intrinsic information has been largely overlooked by language modeling research. The results of both experiments on cgn data and wsj data show that integrating sentence length and word length can achieve improvement. rnnlms allow these features to be incorporated with ease, and obtain improved performance.

Full Text