Continual pre-training mitigates forgetting in language and vision

Andrea Cossu,Antonio Carta,Lucia Passaro,Vincenzo Lomonaco,Tinne Tuytelaars,Davide Bacciu

doi:10.1016/j.neunet.2024.106492

Abstract

Pre-trained models are commonly used in Continual Learning to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during Continual Learning. We investigate the characteristics of the Continual Pre-Training scenario, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks. We introduce an evaluation protocol for Continual Pre-Training which monitors forgetting against a Forgetting Control dataset not present in the continual stream. We disentangle the impact on forgetting of 3 main factors: the input modality (NLP, Vision), the architecture type (Transformer, ResNet) and the pre-training protocol (supervised, self-supervised). Moreover, we propose a Sample-Efficient Pre-training method (SEP) that speeds up the pre-training phase. We show that the pre-training protocol is the most important factor accounting for forgetting. Surprisingly, we discovered that self-supervised continual pre-training in both NLP and Vision is sufficient to mitigate forgetting without the use of any Continual Learning strategy. Other factors, like model depth, input modality and architecture type are not as crucial.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Neural Networks	Publication Date: Jul 1, 2024
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Continual pre-training mitigates forgetting in language and vision

Abstract

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Similar Papers

Continual Learning From a Stream of APIs.
Enneng Yang ... Dacheng Tao
IEEE transactions on pattern analysis and machine intelligence | VOL. PP
Enneng Yang, et. al.Enneng Yang ... Dacheng Tao
01 Jan 2024
IEEE transactions on pattern analysis and machine intelligence | VOL. PP

Do Pre-trained Models Benefit Equally in Continual Learning?
Kuan-Ying Lee ... Yuanyi Zhong
-
Kuan-Ying Lee, et. al.Kuan-Ying Lee ... Yuanyi Zhong
01 Jan 2023
01 Jan 2023

RETRACTED: Continual Learning Approach for Continuous Data Stream Analysis in Dynamic Environments
K. Prasanna ... Mudassir Khan
Applied Sciences | VOL. 13
K. Prasanna, et. al.K. Prasanna ... Mudassir Khan
08 Jul 2023
Applied Sciences | VOL. 13

Bridging pre-trained models to continual learning: A hypernetwork based framework with parameter-efficient fine-tuning techniques
Fengqian Ding ... Hongchao Zhou
Information Sciences | VOL. 674
Fengqian Ding, et. al.Fengqian Ding ... Hongchao Zhou
09 May 2024
Information Sciences | VOL. 674

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Continual pre-training mitigates forgetting in language and vision

Abstract

Talk to us

Similar Papers

More From: Neural Networks