Abstract

Semantic segmentation is a task that covers most of the perception needs of intelligent vehicles in an unified way. Recent studies witnessed that attention mechanisms achieve impressive performance in computer vision task. Current attention mechanisms based segmentation methods differ with each other in position and form of the attention mechanism, and perform differently in practice. This paper firstly introduces the effectiveness of multi-scale context features and attention mechanisms in segmentation tasks. We find that multi-scale and channel attention can play a vital role in constructing effective context features. Based on this analysis, this paper proposes an efficient attention pyramid network (EAPNet) for semantic segmentation. Specifically, to efficient handle the problem of segmenting objects at multiple scales, we design efficient channel attention pyramid (ECAP) which employ atrous convolution with channel attention in cascade or in parallel to capture multi-scale context by using multiple atrous rates. Furthermore, we propose a residual attention fusion block (RAFB), whose purpose is to simultaneously focus on meaningful low-level feature maps and spatial location information. At the same time, we will explore different channel attention modules and spatial attention modules, and describe their impact on network performance. We empirically evaluate our EAPNet on two semantic segmentation datasets, including PASCAL VOC 2012 and Cityscapes datasets. Experimental results show that without MS COCO pre-training and any post-processing, EAPNet achieved 81.7% mIoU on the PASCAL VOC 2012 validation set. With deeplabv3+ as the benchmark, EAPNet improve the model performance of more than 1.50% mIoU.

Highlights

  • Semantic segmentation [2], [15], [33] is one of the fundamental topics in computer vision, whose purpose is to divide visual input into different semantically interpretable categories. ‘‘Semantic interpretability’’ means that the category is meaningful in the real world

  • In order to extract multi-scale context information with channel attention, we propose an efficient channel attention pyramid module (ECAP)

  • For low-level features, we propose residual attentive fusion block module (RAFB), in which the channel attention and spatial attention are combined in sequence and connected by skip connection

Read more

Summary

INTRODUCTION

Semantic segmentation [2], [15], [33] is one of the fundamental topics in computer vision, whose purpose is to divide visual input into different semantically interpretable categories. ‘‘Semantic interpretability’’ means that the category is meaningful in the real world. Pyramid pooling module (PPM) [46] and the atrous spatial pyramid pooling (ASPP) [4] can aggregate multi-scale context information of different regions, thereby improving the ability to obtain global information On this basis, deeplabv plus [5] improve the ASPP and apply encoder-decoder structure, which fuses the information in low-level features and high-level features to predict the segmentation mask. In order to capture multi-scale information, we construct a pyramid network using parallel atrous convolutions and fuse channel attention into the pyramid network On this basis, we will introduce the efficient channel attention pyramid module (ECAP). We use the encoder decoder structure to fuse high-level features and low-level features, and propose an efficient attention pyramid network. By upsampling the input segmentation layer level by level, the final segmentation map is generated

EFFICIENT CHANNEL ATTENTION PYRAMID
EXPERIMENTS
EXPERIMENTAL DETAILS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call