Abstract

Real-time driving scene parsing using semantic segmentation is an essential yet challenging task for an autonomous driving system, where both efficiency and accuracy need to be considered simultaneously. In this article, we propose an efficient and high-performance deep neural network called feature selective fusion network (FSFnet) for robust semantic segmentation of road scenes. Since the complex driving scene parsing usually requires the fusion of features in different levels or scales, we propose a feature selective fusion module (FSFM) to adaptively merge these features by generating correlated weight maps in both spatial and channelwise. Furthermore, a multiscale context enhancement module is designed based on an asymmetric nonlocal neural network to aggregate both multiscale and global context information. The proposed FSFnet obtains precise segmentation results in real time on Cityscapes and CamVid data sets. Specifically, the architecture achieves 77.1% mean pixel intersection-over-union (mIoU) on the Cityscapes test set at a speed of 53 frames per second (FPS) for a $1024\times 2048$ input and 75.1% mIoU on the CamVid test set at a speed of 123 FPS for a $960\times 720$ input on a single NVIDIA 2080 TI GPU.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.