Adaptive instance similarity embedding for online continual learning

Ya-Nan Han,Jian-Wei Liu

doi:10.1016/j.patcog.2023.110238

Abstract

We study the online continual learning (CL) paradigm, where the learner must continually learn a sequence of tasks. In this setting, improving the learning ability of the model and mitigating catastrophic forgetting are two pivotal factors. Note that most existing approaches for online continual learning are based on the experience replay strategy. In this type of method, a memory buffer is applied to store a subset of previous tasks to prevent catastrophic forgetting. The samples from between the current task and the memory buffer are jointly trained to update the network parameters. Consider that most methods only generate the feature embeddings via a shared feature extractor and then train the network via cross-entropy loss. We argue that such methods fail to explore the feature embedding in its entirety and neglect the similar relations between samples, thus leading to lower discriminant performance, especially in an online learning setting. To this end, we propose the Adaptive Instance Similarity Embedding for Online Continual Learning (AISEOCL) framework, which further takes all the sample relations in a given batch into account. In detail, firstly, the experience replay strategy is used to avoid catastrophic forgetting. Then, during training, we apply the adaptive similar embedding to obtain additional valuable similar information from the current training samples composed of the current and previous tasks. Since not all samples are equally important to make a prediction, we further weigh the importance of each instance accordingly resorting to the attention mechanism. Importantly, we further impose a similarity distillation loss on the distributions of the similarity relationship between current and previous models. Such operation can transfer the similarity relationship between different samples from the old model to the current model to alleviate catastrophic forgetting. With this strategy, AISEOCL can further improve the learning ability of the model while enhancing the discriminant power, which is also beneficial to stably resist forgetting. The experiments on several existing benchmarks validate the effectiveness of our proposed approach.

Full Text