Abstract

As a safety-critical application, autonomous driving requires high-quality semantic segmentation and real-time performance for deployment. Existing method commonly suffers from information loss and massive computational burden due to high-resolution input-output and multiscale learning scheme, which runs counter to the real-time requirements. In contrast to channelwise information modeling commonly adopted by modern networks, in this article, we propose a novel real-time driving scene parsing framework named NDNet from a novel perspective of spacewise neighbor decoupling (ND) and neighbor coupling (NC). We first define and implement the reversible operations called ND and NC, which realize lossless resolution conversion for complementary thumbnails sampling and collation to facilitate spatial modeling. Based on ND and NC, we further propose three modules, namely, local capturer and global dependence builder (LCGB), spacewise multiscale feature extractor (SMFE), and high-resolution semantic generator (HSG), which form the whole pipeline of NDNet. The LCGB serves as a stem block to preprocess the large-scale input for fast but lossless resolution reduction and extract initial features with global context. Then the SMFE is used for dense feature extraction and can obtain rich multiscale features in spatial dimension with less computational overhead. As for high-resolution semantic output, the HSG is designed for fast resolution reconstruction and adaptive semantic confusion amending. Experiments show the superiority of the proposed method. NDNet achieves the state-of-the-art performance on the Cityscapes dataset which reports 76.47% mIoU at 240 + frames/s and 78.8% mIoU at 150 + frames/s on the benchmark. Codes are available at https://github.com/LiShuTJ/NDNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call