Abstract

Few-shot Semantic Segmentation (FSS) refers to train a segmentation model that can be generalized to novel categories with limited labeled images. One challenge of FSS is spatial inconsistency between support and query images, e.g., appearance and texture. Most existing methods are only committed to utilizing the semantic-level prototypes of support images to guide mask predictions. These methods, nevertheless, only focus on the most discriminate regions of the object rather than holonomic feature representations. Besides, another question exists that the lack of interaction between paired support and query images. In this paper, we propose a self-align and cross-align transformer (SCTrans) to remedy the above limitations. Specifically, we design a feature fusion module (FFM) to incorporate low-level information from the query branch into mid-level semantic features, boosting the semantic representations of query images. In addition, a feature alignment module is designed to bidirectionally propagate semantic information from support to query images conditioned on more representative support and query features, increasing both intra-class similarities and inter-class differences. Extensive experiments on PASCAL-5i and COCO-20i show that our SCTrans significantly advances the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call