Abstract

A deep convolutional neural network has been widely used in image semantic segmentation in recent years, its deployment on mobile terminals, however is limited by its high computational costs. Given the slow inference speed and large memory usage of deep convolutional neural networks, we propose a lightweight and densely connected pyramid network (LDPNet) for real-time semantic segmentation. Firstly, a densely connected atrous pyramid (DCAP) module is constructed in the encoding process to extract multi-scale context information for forwarding propagation, strengthen the reuse of features, and offset the spatial information lost in the down-sampling process of the feature map. Secondly, a cross-fusion (CF) module embedded in each other during the decoding process is proposed, which uses high-level semantic features to effectively guide the fusion of low-level spatial details while strengthening context information. Our network is tested on two complex urban road scene data sets. Among them, experiments on the Cityscapes data set show that our structure has 87 frames per second (FPS) on a single NVIDIA GTX1080Ti GPU. The Mean Intersection over Union (mIoU) reaches 71.1%, and the parameter is only 0.8M. Compared with the existing similar networks, the new system achieves a state-of-the-art trade-off between efficiency and accuracy.

Highlights

  • T HE transformation from experience-driven artificial feature paradigm to a data-driven representation learning paradigm is realized by means of the deep learning with strong nonlinear modeling capability

  • The Mean Intersection over Union (mIoU), frames per second (FPS), and model parameters under different conditions are illustrated in Table 5, from which it can be seen that the number of atrous pyramid (AP) bottleneck blocks has a more significant impact on the model in the third stage than in the second stage

  • The densely connected atrous pyramid (DCAP) module and the CF module are proposed for the segmentation of complex urban road scenes

Read more

Summary

INTRODUCTION

T HE transformation from experience-driven artificial feature paradigm to a data-driven representation learning paradigm is realized by means of the deep learning with strong nonlinear modeling capability. PSPNet[2] and DeepLab[3] have a total of 250.8M parameters and 262.1M parameters respectively, besides, both of them are over 100 layers, their inference speed is far below the minimum frames required for a video (24 frames) These large-scale and high-precision models still require long processing time, even when running on the most advanced modern GPUs [4]. Small segmentation networks which are low in computing cost, fast in inference speed, and memory-friendly are often desired In response to these problems, researchers have proposed many real-time semantic segmentation networks based on deep learning in recent years. Based on the above analysis, a lightweight, densely connected pyramid network (LDPNet) is proposed for real-time semantic segmentation tasks.

RELATED WORK
DENSELY CONNECTED ATROUS PYRAMID MODULES
CROSS FUSION MODULE
EXPERIMENTS
DATASET
LOSS FUNCTION
Method
COMPARISON WITH STATE-OF-THE-ARTS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call