Distill and Replay for Continual Language Learning

Jingyuan Sun,Chengqing Zong,Shaonan Wang,Jiajun Zhang

doi:10.18653/v1/2020.coling-main.318

Abstract

Accumulating knowledge to tackle new tasks without necessarily forgetting the old ones is a hallmark of human-like intelligence. But the current dominant paradigm of machine learning is still to train a model that works well on static datasets. When learning tasks in a stream where data distribution may fluctuate, fitting on new tasks often leads to forgetting on the previous ones. We propose a simple yet effective framework that continually learns natural language understanding tasks with one model. Our framework distills knowledge and replays experience from previous tasks when fitting on a new task, thus named DnR (distill and replay). The framework is based on language models and can be smoothly built with different language model architectures. Experimental results demonstrate that DnR outperfoms previous state-of-the-art models in continually learning tasks of the same type but from different domains, as well as tasks of different types. With the distillation method, we further show that it’s possible for DnR to incrementally compress the model size while still outperforming most of the baselines. We hope that DnR could promote the empirical application of continual language learning, and contribute to building human-level language intelligence minimally bothered by catastrophic forgetting.

Highlights

Humans and many advanced animals can learn new tasks without necessarily forgetting the old ones (Glenberg, 1997; Zenke et al, 2017)
We will first give an overview of different models’ continual learning ability and evaluate if they are robust to the variation of the task order in a sequence
We propose Distill and Replay (DnR), a simple yet effective framework for continual language learning

Summary

Introduction

Humans and many advanced animals can learn new tasks without necessarily forgetting the old ones (Glenberg, 1997; Zenke et al, 2017). This ability to continuously learn, accumulate knowledge and reuse them to tackle new challenges through the lifespan is a critical requirement for human-like intelligence. When learning tasks in a stream where data distribution may shift, the models generally fail to isolate acquired knowledge and forget previously learned tasks. Such phenomenon is known as catastrophic forgetting. Most of the methods have only been applied to solve computer vision tasks

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distill and Replay for Continual Language Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 13	License type: cc-by

Similar Papers

SLER: Self-generated long-term experience replay for continual reinforcement learning
Chunmao Li ... Xupeng Geng
Applied Intelligence | VOL. 51
Chunmao Li, et. al.Chunmao Li ... Xupeng Geng
07 Aug 2020
Applied Intelligence | VOL. 51

Adaptive online continual multi-view learning
Yang Yu ... Jiang Hu
Information Fusion | VOL. 103
Yang Yu, et. al.Yang Yu ... Jiang Hu
22 Sep 2023
Information Fusion | VOL. 103

Contrastive Learning for Boosting Knowledge Transfer in Task-Incremental Continual Learning of Aspect Sentiment Classification Tasks
Thanh Hai Dang ... Tri-Thanh Nguyen
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 40
Thanh Hai Dang, et. al.Thanh Hai Dang ... Tri-Thanh Nguyen
17 Jun 2024
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 40

Reducing Catastrophic Forgetting With Associative Learning: A Lesson From Fruit Flies.
Yang Shen ... Saket Navlakha
Neural Computation | VOL. 35
Yang Shen, et. al.Yang Shen ... Saket Navlakha
10 Oct 2023
Neural Computation | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distill and Replay for Continual Language Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers