Abstract
Weakly supervised semantic segmentation has been attracting increasing attention as it can alleviate the need for expensive pixel-level annotations through the use of image-level labels. Relevant methods mainly rely on the implicit object localization ability of convolutional neural networks (CNNs). However, generated object attention maps remain mostly small and incomplete. In this paper, we propose an Atrous Convolutional Feature Network (ACFN) to generate dense object attention maps. This is achieved by enhancing the context representation of image classification CNNs. More specifically, cascaded atrous convolutions are used in the middle layers to retain sufficient spatial details, and pyramidal atrous convolutions are used in the last convolutional layers to provide multi-scale context information for the extraction of object attention maps. Moreover, we propose an attentive fusion strategy to adaptively fuse the multi-scale features. Our method shows improvements over existing methods on both the PASCAL VOC 2012 and MS COCO datasets, achieving state-of-the-art performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.