This paper addresses the issue of language model adaptation for Recurrent Neural Network Language Models (rnnlms), which have recently emerged as a state-of-the-art method for language modeling in the area of speech recognition. Curriculum learning is an established machine learning approach that achieves better models by applying a curriculum, i.e., a well-planned ordering of the training data, during the learning process. Our contribution is to demonstrate the importance of curriculum learning methods for adapting rnnlms and to provide key insights on how they should be applied. rnnlms model language in a continuous space and can theoretically exploit word-dependency information over arbitrarily long distances. These characteristics give rnnlms the ability to learn patterns robustly with relatively little training data, implying that they are well suited for adaptation. In this paper, we focus on two related challenges facing language models: within-domain adaptation and limited-data within-domain adaptation. We propose three types of curricula that start with general data, i.e., characterizing the domain as a whole, and move towards specific data, i.e., characterizing the sub-domain targeted for adaptation. Effectively, these curricula result in a model that can be considered to represent an implicit interpolation between general data and sub-domain-specific data. We carry out an extensive set of experiments that investigates how adapting rnnlms using curriculum learning can improve their performance.Our first set of experiments addresses the within-domain adaptation challenge, i.e., creating models that are adapted to specific sub-domains that are part of a larger, heterogeneous domain of speech data. Under this challenge, all training data is available to the system at the time when the language model is trained. First, we demonstrate that curriculum learning can be used to create effective sub-domain-adapted rnnlms. Second, we show that a combination of sub-domain-adapted rnnlms can be used if the sub-domain of the target data is unknown at test time. Third, we explore the potential of applying combinations of sub-domain-adapted rnnlms to data for which sub-domain information is unknown at training time and must be inferred.Our second set of experiments addresses limited-data within-domain adaptation, i.e., adapting an existing model trained on a large set of data using a smaller amount of data from the target sub-domain. Under this challenge, data from the target sub-domain is not available at the time when the language model is trained, but rather becomes available little by little over time. We demonstrate that the implicit interpolation carried out by applying curriculum learning methods to rnnlms outperforms conventional interpolation and has the potential to make more of less adaptation data.
Read full abstract