Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment

Salil Deena,Madina Hasan,Oscar Saz,Mortaza Doulaty,Thomas Hain

doi:10.1109/taslp.2018.2888814

Abstract

Recurrent neural network language models (RNNLMs) generally outperform $n$ -gram language models when used in automatic speech recognition (ASR). Adapting RNNLMs to new domains is an open problem and current approaches can be categorised as either feature-based or model based. In feature-based adaptation, the input to the RNNLM is augmented with auxiliary features whilst model-based adaptation includes model fine-tuning and the introduction of adaptation layer(s) in the network. In this paper, the properties of both types of adaptation are investigated on multi-genre broadcast speech recognition. Existing techniques for both types of adaptation are reviewed and the proposed techniques for model-based adaptation, namely the linear hidden network adaptation layer and the $K$ -component adaptive the RNNLM, are investigated. Moreover, new features derived from the acoustic domain are investigated for the RNNLM adaptation. The contributions of this paper include two hybrid adaptation techniques: the fine-tuning of feature-based RNNLMs and a feature-based adaptation layer. Moreover, the semi-supervised adaptation of RNNLMs using genre information is also proposed. The ASR systems were trained using 700 h of multi-genre broadcast speech. The gains obtained when using the RNNLM adaptation techniques proposed in this paper are consistent when using RNNLMs trained on an in-domain set of 10M words and on a combination of in-domain and out-of-domain sets of 660 M words, with approx. $\text{10}{\%}$ perplexity and $\text{2}{\%}$ relative word error rate improvements on a 28.3 h. test set. The best RNNLM adaptation techniques for ASR are also evaluated on a lightly supervised alignment of subtitles task for the same data, where the use of RNNLM adaptation leads to an absolute increase in the F–measure of $\text{0.5}{\%}$ .

Full Text