Bimodal semantic fusion prototypical network for few-shot classification

Xilang Huang,Seon Han Choi

doi:10.1016/j.inffus.2024.102421

Abstract

Few-shot classification learns from a small number of image samples to recognize unseen images. Recent few-shot learning exploits auxiliary text information, such as class labels and names, to obtain more discriminative class prototypes. However, most existing approaches rarely consider using text information as a clue to highlight important feature regions and do not consider feature alignment between prototypes and targets, leading to prototype ambiguity owing to information gaps. To address this issue, a prototype generator module was developed to perform interactions between the text knowledge of the class name and visual feature maps in the spatial and channel dimensions. This module learns how to assign mixture weights to essential regions of each sample feature to obtain informative prototypes. In addition, a feature refinement module was proposed to embed text information into query images without knowing their labels. It generates attention from concatenated features between query and text features through pairwise distance loss. To improve the alignment between the prototype and relevant targets, a prototype calibration module was designed to preserve the important features of the prototype by considering the interrelationships between the prototype and query features. Extensive experiments were conducted on five few-shot classification benchmarks, and the results demonstrated the superiority of the proposed method over state-of-the-art methods in 1-shot and 5-shot settings.

Full Text