In continuous unsupervised domain adaptation (CUDA), deep learning models struggle with the stability-plasticity trade-off—where the model must forget old knowledge to acquire new one. This paper introduces the “Forget to Learn” (F2L), a novel framework that circumvents such a trade-off. In contrast to state-of-the-art methods that aim to balance the two conflicting objectives, stability and plasticity, F2L utilizes active forgetting and knowledge distillation to circumvent the conflict’s root causes. In F2L, dual-encoders are trained, where the first encoder – the ‘Specialist’ – is designed to actively forget, thereby boosting adaptability (i.e., plasticity) and generating high-accuracy pseudo labels on the new domains. Such pseudo labels are then used to transfer/accumulate the specialist knowledge to the second encoder—the ‘Generalist’ through conflict-free knowledge distillation. Empirical and ablation studies confirmed F2L’s superiority on different datasets and against different SOTAs. Furthermore, F2L minimizes the need for hyperparameter tuning, enhances computational and sample efficiency, and excels in problems with long domain sequences—key advantages for practical systems constrained by hardware limitations.
Read full abstract