Abstract

Few-Shot Class-Incremental Learning (FSCIL) aims at integrating new concepts from a minimal set of instances while preserving previously acquired knowledge. This study explores the potential of Vision Transformers (ViTs) in addressing the challenges inherent in the FSCIL paradigm, such as catastrophic forgetting and overfitting problems. Drawing insights from cognitive neuroscience, we propose a Cognition-Driven Framework (CoDF) for FSCIL, leveraging Vision Transformers to emulate human cognitive processes from the aspects of Intuitive Acquisition and Structured Cognition. On the one hand, we employ self-supervised learning techniques to imbue the representations with richer information, thereby facilitating a more intuitive acquisition of essential information to solve FSCIL tasks. On the other hand, we propose to structure the learned representations by introducing biases into the prior distribution of the latent factors, which involves a multivariate Gaussian Mixture Model (GMM) and an intra-class distribution assumption. Furthermore, we apply an innovative extended warm-up strategy to effectively harness the acquired representations for downstream tasks. Comprehensive experiments on three public datasets substantiate the efficacy of our proposed framework.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call