Few-shot image classification aims at exploring transferable features from base classes to recognize images of the unseen novel classes with only a few labeled images. Existing methods usually compare the support features and query features, which are implemented by either matching the global feature vectors or matching the local feature maps at the same position. However, few labeled images fail to capture all the diverse context and intraclass variations, leading to mismatch issues for existing methods. On one hand, due to the misaligned position and cluttered background, existing methods suffer from the object mismatch issue. On the other hand, due to the scale inconsistency between images, existing methods suffer from the scale mismatch issue. In this article, we propose the bilaterally normalized scale-consistent Sinkhorn distance (BSSD) to solve these issues. First, instead of same-position matching, we use the Sinkhorn distance to find an optimal matching between images, mitigating the object mismatch caused by misaligned position. Meanwhile, we propose the intraimage and interimage attentions as the bilateral normalization on the Sinkhorn distance to suppress the object mismatch caused by background clutter. Second, local feature maps are enhanced with the multiscale pooling strategy, making the Sinkhorn distance possible to find a consistent matching scale between images. Experimental results show the effectiveness of the proposed approach, and we achieve the state-of-the-art on three few-shot benchmarks.
Read full abstract