Abstract

Semantic segmentation is a process of linking each pixel in an image to a class label, and is widely used in the field of autonomous vehicles and robotics. Although deep learning methods have already made great progress for semantic segmentation, they either achieve great results with numerous parameters or design lightweight models but heavily sacrifice the segmentation accuracy. Because of the strict requirements of real-world applications, it is critical to design an effective real-time model with both competitive segmentation accuracy and small model capacity. In this paper, we propose a lightweight network named DABNet, which employs Depth-wise Asymmetric Bottleneck (DAB) and Point-wise Aggregation Decoder (PAD) module to tackle the challenging real-time semantic segmentation in urban scenes. Specifically, the DAB module creates a sufficient receptive field and densely utilizes the contextual information, and the PAD module aggregates the feature maps of different scales to optimize performance through the attention mechanism. Compared with existing methods, our network substantially reduces the number of parameters but still achieves high accuracy with real-time inference ability. Extensive ablation experiments on two challenging urban scene datasets (Cityscapes and CamVid) have proved the effectiveness of the proposed approach in real-time semantic segmentation.

Highlights

  • Deep convolutional neural network (DCNN) based on encoder-decoder framework has become a powerful architecture for dense prediction tasks, such as pose estimation and semantic segmentation

  • Based on Depth-wise Asymmetric Bottleneck (DAB) and Point-wise Aggregation Decoder (PAD), we propose a lightweight network called DABNet, which has much fewer parameters than the existing state-of-the-art real-time semantic segmentation methods while providing impressive accuracy and faster inference speed

  • In this paper, we propose the Depth-wise Asymmetric Bottleneck with Point-wise Aggregation Decoder for real-time semantic segmentation in urban scenes

Read more

Summary

INTRODUCTION

Deep convolutional neural network (DCNN) based on encoder-decoder framework has become a powerful architecture for dense prediction tasks, such as pose estimation and semantic segmentation. PSPNet [5] proposed a pyramid pooling module (PPM) to improve performance by aggregating context information at different scales These architectures have successfully brought about a significant increase in accuracy, the heavy structure and immense memory footprint make them unfeasible to be applied in real life. Many existing semantic segmentation models, aiming at real-time, employ dilated convolution to improve performance Another method to raise efficiency is depth-wise separable convolution (ds-Conv). We propose an efficient module named Depth-wise Asymmetric Bottleneck (DAB) This module extracts local and contextual information jointly and dramatically reduces the parameters, which is suitable for high-resolution urban scenes. Based on DAB and PAD, we propose a lightweight network called DABNet, which has much fewer parameters than the existing state-of-the-art real-time semantic segmentation methods while providing impressive accuracy and faster inference speed. Under the same device (GTX 1080Ti), our DABNet achieves 4.3% higher accuracy than the stateof-the-art ICNet [24], while our model only uses 10% of ICNet’s parameters with nearly the same inference speed (23.7 vs 25.1)

RELATED WORK
POINT-WISE ATTENTION DECODER
EXPERIMENTS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.