Abstract

AbstractSemantic segmentation is the basic work in computer vision; it has been shown to achieve adequate performance in the past few years. However, owing to the inherent logical obstacles in the classification architecture, it lacks the ability to understand the long-distance dependence in the image. To address this issue, we propose a new architecture to allow the model to more expansively mine the available information for classification and segmentation tasks in a weakly supervised manner. Firstly, we raise a masking-based data enhancement approach, where images are randomly masked based on scale, forcing the model to observe other parts of the object. Secondly, a long-range correlation matrix is introduced from the image itself to make the class activation mapping (CAM) a more complete coverage on foreground objects. Finally, the experimental results on the PASCAL VOC 2012 dataset show that our method can better exploit the salient parts and non-salient regions of foreground objects in weakly labeled images comparing with other methods. On the test set, our approach achieves mIoUs of 60.9% using ResNet-based segmentation models, outperforming other methods for the weakly supervised semantic segmentation (WSSS) task.KeywordsWeakly supervised semantic segmentation (WSSS)Class activation mapping (CAM)Data enhancementSelf-attention mechanism

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call