Few-shot open-set recognition (FSOR) represents a relatively underexplored area of research. The primary challenge encountered by FSOR methods lies in recognizing known classes while simultaneously rejecting unknown classes utilizing only limited samples. Current FSOR methods predominantly rely on the visual information extracted from images to establish class representations, aiming to derive distinguishable classification scores for both known and unknown classes. However, these methods often overlook the benefits of leveraging semantic information derived from class names associated with images, which could provide valuable auxiliary learning insights. This study introduces a feature-semantic augmentation network to improve FSOR performance utilizing multimodal information. Specifically, we augment the class-specific features of closed-set prototypes by integrating visual and textual features from known class names across both local and global feature spaces. To facilitate prototype learning, We introduce a refinement and fusion module. Among these, the former leverages the similarity between prototype and target features at both channel and spatial dimensions to calibrate targets relative to their relevant prototypes. Meanwhile, the latter employs additional classification targets generated by the fusion module to provide learning sources from different classes. Experimental results on various few-shot learning benchmarks show that the proposed method significantly outperforms current state-of-the-art methods across both closed- and open-set scenarios.
Read full abstract