Abstract

Multi-layer feature fusion is a very important strategy for semantic segmentation, as a single-layer feature is usually unable to make an accurate prediction on every pixel. However, most current methods adopt direct summing or channel concatenation on multi-layer features, lacking of consideration of the distinction and complementarity between them. To explore their respective importance and to achieve an appropriate fusion on each pixel, in this paper, we propose a novel multi-layer adaptive feature fusion method for semantic segmentation, which is based on attention mechanism. Specifically, our method encourages the network to learn the importance of features from different layer according to the content of input image and the specific capability of each layer of feature, expressed in the form of weight map. By pixel-wisely multiplying the features with their corresponding weight maps, we can change the response values proportionally at each pixel and get several weighted features. Finally, the weighted features are summed up to obtain the highly fused feature for discrimination. A series of comparative experiments are carried out on two public datasets, PASCAL VOC 2012 and PASCAL-Person-Part, which successfully prove the effectiveness of our method. Furthermore, we visualize the weight maps of the multi-layer features to facilitate an intuitive understanding of their importance at different location.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.