Abstract
A deep convolutional neural network has been widely used in image semantic segmentation in recent years, its deployment on mobile terminals, however is limited by its high computational costs. Given the slow inference speed and large memory usage of deep convolutional neural networks, we propose a lightweight and densely connected pyramid network (LDPNet) for real-time semantic segmentation. Firstly, a densely connected atrous pyramid (DCAP) module is constructed in the encoding process to extract multi-scale context information for forwarding propagation, strengthen the reuse of features, and offset the spatial information lost in the down-sampling process of the feature map. Secondly, a cross-fusion (CF) module embedded in each other during the decoding process is proposed, which uses high-level semantic features to effectively guide the fusion of low-level spatial details while strengthening context information. Our network is tested on two complex urban road scene data sets. Among them, experiments on the Cityscapes data set show that our structure has 87 frames per second (FPS) on a single NVIDIA GTX1080Ti GPU. The Mean Intersection over Union (mIoU) reaches 71.1%, and the parameter is only 0.8M. Compared with the existing similar networks, the new system achieves a state-of-the-art trade-off between efficiency and accuracy.
Highlights
T HE transformation from experience-driven artificial feature paradigm to a data-driven representation learning paradigm is realized by means of the deep learning with strong nonlinear modeling capability
The Mean Intersection over Union (mIoU), frames per second (FPS), and model parameters under different conditions are illustrated in Table 5, from which it can be seen that the number of atrous pyramid (AP) bottleneck blocks has a more significant impact on the model in the third stage than in the second stage
The densely connected atrous pyramid (DCAP) module and the CF module are proposed for the segmentation of complex urban road scenes
Summary
T HE transformation from experience-driven artificial feature paradigm to a data-driven representation learning paradigm is realized by means of the deep learning with strong nonlinear modeling capability. PSPNet[2] and DeepLab[3] have a total of 250.8M parameters and 262.1M parameters respectively, besides, both of them are over 100 layers, their inference speed is far below the minimum frames required for a video (24 frames) These large-scale and high-precision models still require long processing time, even when running on the most advanced modern GPUs [4]. Small segmentation networks which are low in computing cost, fast in inference speed, and memory-friendly are often desired In response to these problems, researchers have proposed many real-time semantic segmentation networks based on deep learning in recent years. Based on the above analysis, a lightweight, densely connected pyramid network (LDPNet) is proposed for real-time semantic segmentation tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have