Abstract

Multi-level features and contextual information have been proven effective in semantic segmentation. As a common solution, dilated convolution with multiple dilation rates is widely adopted by various computer vision tasks. However, since a large number of pixels are not involved in convolutional calculation, dilated convolution suffers from the information loss problem. Another important aspect for image segmentation is that most feature fusion methods combine different level features directly, which ignores the semantic gap between shallow layer features and deep layer features. In this work, we propose the multi-scale cross convolution to alleviate the information loss problem. Cross convolution conducts convolutional operations in horizontal and vertical direction with different kernels. By combining cross convolution with dilated convolutions using different convolution kernels, more pixels are engaged in convolutional operation to capture multi-scale features. To address the issue of semantic gap between multi-layer features, a feature fusion scheme is developed in which a dual attention mechanism is applied to conduct feature refinement in both spatial and channel dimensions. Comprehensive experiments are conducted to evaluate the proposed method on Cityscapes and ADE20K datasets. Experimental results demonstrate that the cross convolution and feature fusion method improve segmentation performance significantly and achieve competitive performance over state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call