Multi-layer Adaptive Feature Fusion for Semantic Segmentation

Yizhen Chen,Haifeng Hu

doi:10.1007/s11063-019-10129-2

Abstract

Multi-layer feature fusion is a very important strategy for semantic segmentation, as a single-layer feature is usually unable to make an accurate prediction on every pixel. However, most current methods adopt direct summing or channel concatenation on multi-layer features, lacking of consideration of the distinction and complementarity between them. To explore their respective importance and to achieve an appropriate fusion on each pixel, in this paper, we propose a novel multi-layer adaptive feature fusion method for semantic segmentation, which is based on attention mechanism. Specifically, our method encourages the network to learn the importance of features from different layer according to the content of input image and the specific capability of each layer of feature, expressed in the form of weight map. By pixel-wisely multiplying the features with their corresponding weight maps, we can change the response values proportionally at each pixel and get several weighted features. Finally, the weighted features are summed up to obtain the highly fused feature for discrimination. A series of comparative experiments are carried out on two public datasets, PASCAL VOC 2012 and PASCAL-Person-Part, which successfully prove the effectiveness of our method. Furthermore, we visualize the weight maps of the multi-layer features to facilitate an intuitive understanding of their importance at different location.

Full Text