Abstract

Real-time semantic segmentation provides precise insights into dynamic street environments for autonomous driving, traffic control, and urban planning. However, state-of-the-art models following attention mechanisms and deep convolutional neural networks have improved semantic segmentation at the cost of complex architectures and high computation complexity. The study aims to mitigate the presence of gridding artifacts and enhance semantic segmentation performance. In addition, we propose a multi-level downsampling approach before employing the depth-wise split separable global convolution with the bottleneck to achieve a trade-off between accuracy and inference time. The spatial attention module used in this study effectively keeps low-level spatial characteristics, enhancing the accuracy of localization, robustness against disturbances, processing efficiency, and the ability to handle occlusions. Thorough tests of the Cityscapes and CamVid datasets available for public access indicate that the model presented is capable of efficiently processing high-resolution photos in real time, resulting in exceptional performance. The model has achieved an accuracy of 72.3% on the cityscapes dataset and 72.7% on the CamVid dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call