The goal of few-shot image classification is to learn a classifier that can be well generalized to the unseen classes with a few available labeled samples. One major challenge for few-shot learning is how to conduct effective image representation for support and query images. Recently, local region-based image representation and metric learning approaches have been demonstrated effectively for few-shot classification problem. However, existing approaches generally conduct representations of image regions individually which thus lack of considering the rich spatial/structural relationships among image regions. In this paper, we propose to bridge the individual regions and exploit the structural contexts among regions via a novel Region-Graph Transformer (RGTransformer). In RGTransformer, each region aggregates the information from its neighboring regions and thus can obtain context-aware feature representations for regions. Using the proposed RGTransformer, we propose an effective metric learning model for few-shot image classification. We evaluate the proposed method on four benchmark datasets and experimental results demonstrate the effectiveness and advantages of the proposed RGTransformer.
Read full abstract