Abstract

Semantic segmentation is vital for autonomous car scene understanding. It provides more precise subject information than raw RGB images and this, in turn, boosts the performance of autonomous driving. Recently, self-attention methods show great improvement in image semantic segmentation. Attention maps help scene parsing with abundant relationships of every pixel in an image. However, it is computationally demanding. Besides, existing works focus either on channel attention, ignoring the pixel position factors, or on spatial attention, disregarding the impacts of the channels on each other. To address these problems, we present Fusion Attention Network based on self-attention mechanism to harvest rich contextual dependencies. This model consists of two chains: pyramid fusion spatial attention and fusion channel attention. We apply pyramid sampling in the spatial attention module to reduce the computation for spatial attention maps. Channel attention has a similar structure to the spatial attention. We also introduce a fusion technique to calculate contextual dependencies using features from both attention chains. We concatenate the results from spatial and channel attention modules as the enhanced attention map, leading to better semantic segmentation results. We conduct extensive experiments on popular datasets with different settings in addition to an ablation study to prove the efficiency of our approach. Our model achieves better results, on Cityscapes [7], compared to state-of-the-art methods, and also show good generalization capability on PASCAL VOC 2012 [9].

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call