SCTrans: Self-align and cross-align transformer for few-shot segmentation

Jun Ding,Zhen Zhang,Qiyu Wang,Huibin Wang

doi:10.1016/j.imavis.2023.104893

Abstract

Few-shot Semantic Segmentation (FSS) refers to train a segmentation model that can be generalized to novel categories with limited labeled images. One challenge of FSS is spatial inconsistency between support and query images, e.g., appearance and texture. Most existing methods are only committed to utilizing the semantic-level prototypes of support images to guide mask predictions. These methods, nevertheless, only focus on the most discriminate regions of the object rather than holonomic feature representations. Besides, another question exists that the lack of interaction between paired support and query images. In this paper, we propose a self-align and cross-align transformer (SCTrans) to remedy the above limitations. Specifically, we design a feature fusion module (FFM) to incorporate low-level information from the query branch into mid-level semantic features, boosting the semantic representations of query images. In addition, a feature alignment module is designed to bidirectionally propagate semantic information from support to query images conditioned on more representative support and query features, increasing both intra-class similarities and inter-class differences. Extensive experiments on PASCAL-5i and COCO-20i show that our SCTrans significantly advances the state-of-the-art methods.

Full Text