Abstract

Real-time semantic segmentation in traffic scenes plays an essential part in autonomous driving. The encoder–decoder-based network architecture can well combine the context information and detailed information required for the semantic segmentation task. Achieving a good balance between inference speed and accuracy is a crucial challenge, as considerable real-time semantic segmentation models process information in real-time at the expense of accuracy degradation. This paper presents an encoder–decoder network model based on Cross Stage Partial (CSP) block for real-time semantic segmentation in traffic scenes. Integrating the CSP block can not only lessen the computational overhead but also enhance the feature extraction ability of the network. In addition, we append the Fast Spatial Pyramid Pooling module to the backbone of the network, which can aggregate global information at a low computational cost. On NVIDIA RTX 3090, the middle model of our method can achieve a mean intersection over union (mIOU) of 80.8% at 64.3 frames per second (FPS) on the Cityscapes test set and an mIOU of 81.3% at 105.3 FPS on the CamVid Test Set. The large model of our method can realize an mIOU of 81.5% at 48.4 FPS on the test set of Cityscapes. Our source code is available at https://github.com/zhouliguo/cspsg.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call