Lifelong Language Learning With the Most Forgotten Knowledge

Heejeong Choi,Pilsung Kang

doi:10.1109/access.2021.3071787

Heejeong Choi, Pilsung Kang

Open Access

https://doi.org/10.1109/access.2021.3071787

Copy DOI

Abstract

Lifelong language learning enables a language model to accumulate knowledge through training on a stream of text data. Recent research on lifelong language learning is based on samples of previous tasks from an episodic memory or generative model. LAMOL, a representative generative model-based lifelong language learning model, preserves the previous information with the generated pseudo-old samples, which are suboptimal. In this paper, we propose an improved version of LAMOL, MFK-LAMOL, which constructs a generative replay using a more effective method. When a new task is received, MFK-LAMOL replays sufficient previous data and retrieves important examples for training alongside the new task. Specifically, it selects the examples with the most forgotten knowledge learned from previous tasks based on the extent to which they include knowledge that has been forgotten after learning new information. We showed that the proposed method outperforms LAMOL on a stream of three different natural language processing tasks.

Highlights

Artificial neural networks have successfully exceeded human ability on narrow tasks such as image recognition, video game play, and voice recognition [1]–[3]
We focus on LAMOL, which can prevent catastrophic forgetting based on generative replay
TASK tokens are task-specific tokens for each task, and all previous tasks have the same share of the generated pseudo-samples. These samples are likely to be suboptimal because LAMOL generates only a certain number of samples and uses all of them to train the task

Summary

INTRODUCTION

Artificial neural networks have successfully exceeded human ability on narrow tasks such as image recognition, video game play, and voice recognition [1]–[3]. While humans can continually learn new skills and accumulate knowledge throughout their lifetimes [5], neural networks suffer from a catastrophic forgetting; that is, training a model with new information interferes with previously learned knowledge [6], [7]. To alleviate this problem, various approaches have been proposed for continual lifelong learning. We focus on lifelong language learning (LLL), where a model addresses a stream of NLP tasks. We propose an MFK-LAMOL, an advanced LAMOL with an efficient generative replay construction method that ensures pseudo-old samples contain the most forgotten knowledge for previous tasks.

RELATED WORK

8: Sort samples in descending order by criterion

RESULTS

Findings

CONCLUSION