Abstract
Recurrent neural network language models (RNNLMs) generally outperform $n$ -gram language models when used in automatic speech recognition (ASR). Adapting RNNLMs to new domains is an open problem and current approaches can be categorised as either feature-based or model based. In feature-based adaptation, the input to the RNNLM is augmented with auxiliary features whilst model-based adaptation includes model fine-tuning and the introduction of adaptation layer(s) in the network. In this paper, the properties of both types of adaptation are investigated on multi-genre broadcast speech recognition. Existing techniques for both types of adaptation are reviewed and the proposed techniques for model-based adaptation, namely the linear hidden network adaptation layer and the $K$ -component adaptive the RNNLM, are investigated. Moreover, new features derived from the acoustic domain are investigated for the RNNLM adaptation. The contributions of this paper include two hybrid adaptation techniques: the fine-tuning of feature-based RNNLMs and a feature-based adaptation layer. Moreover, the semi-supervised adaptation of RNNLMs using genre information is also proposed. The ASR systems were trained using 700 h of multi-genre broadcast speech. The gains obtained when using the RNNLM adaptation techniques proposed in this paper are consistent when using RNNLMs trained on an in-domain set of 10M words and on a combination of in-domain and out-of-domain sets of 660 M words, with approx. $\text{10}{\%}$ perplexity and $\text{2}{\%}$ relative word error rate improvements on a 28.3 h. test set. The best RNNLM adaptation techniques for ASR are also evaluated on a lightly supervised alignment of subtitles task for the same data, where the use of RNNLM adaptation leads to an absolute increase in the F–measure of $\text{0.5}{\%}$ .
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.