Abstract
AbstractWeakly-supervised semantic segmentation (WSSS) receives increasing attentions from the community in recent years as it leverages the weakly annotated data to solve the problem of lacking of fully annotated data. Among them, the WSSS method based on image-level annotation is the most direct and effective while the image-level annotation is easy to obtain. Most advanced methods use class activation maps (CAM) as initial pseudo-labels, however, they only identify local regions of the target, while ignoring the context information among local regions. To solve this problem, this paper proposes a deformable convolution based self-attention module (DSAM), which introduces a pixel relationship matrix, to learn the contextual information of the image. A regularization loss is introduced to narrow the distance between the DSAM and the CAM. Compared to the base CAM method, our method can identify more target features and robustly improve the performance of WSSS without training the classifier multiple times. Our proposed method achieves the mIoU of 65.5% and 66.8% on the Pascal VOC 2012 val and test sets, respectively, demonstrating the feasibility of the method.KeywordsDeformable convolutionSelf-attentionConvolutional neural networkWeakly-supervised semantic segmentation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.