Abstract

Semantic segmentation in complex traffic scenes is a challenging research topic in the field of computer vision. Algorithms based on convolutional neural network has achieved more outstanding results than traditional algorithms, but their segmentation performance needs to be further improved when faced with real scenes with complex backgrounds and variable scales. In response to this issue, this study proposes a fully convolutional network architecture based on an multi-scale attention pyramid to improve the performance of the semantic segmentation algorithm from several perspectives. Firstly, a lightweight dual attention module based on depth separable convolution is designed. This module uses depth separable convolution to simplify the modeling of semantic correlation between the spatial dimension and the channel dimension, and reduces the parameter quantity of the original dual attention module. Secondly, we constructed a multi-scale attention pyramid module, which uses feature maps of different receptive fields or different scales to output multiple prediction results. Finally, an adaptive multi-scale prediction fusion module is designed. This module adaptively fuses the prediction results of multiple different receptive fields or different scales. It further enhances the network’s predictive capabilities and generates detailed high-resolution predictive maps. Compared to the baseline DANet, we have achieved better results on the Cityscapes, PASCAL VOC 2012, and COCO Stuff datasets. We make the code publicly available at https://github.com/Exception-star/AMDANet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call