Abstract

Representation models pre-trained on unlabeled data show competitive performance in speech recognition, even when fine-tuned on small amounts of labeled data. The continual representation learning (CTRL) framework combines pre-training and continual learning methods to obtain powerful representation. CTRL relies on two neural networks, online and offline models, where the fixed latter model transfers information to the former model with continual learning loss. In this paper, we present momentum continual representation learning (M-CTRL), a framework that slowly updates the offline model with an exponential moving average of the online model. Our framework aims to capture information from the offline model improved on past and new domains. To evaluate our framework, we continually pre-train wav2vec 2.0 with M-CTRL in the following order: Librispeech, Wall Street Journal, and TED-LIUM V3. Our experiments demonstrate that M-CTRL improves the performance in the new domain and reduces information loss in the past domain compared to CTRL.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call