Abstract

Semantic segmentation is a challenging and computationally heavy task in computer vision. Recent works have shown promising results applied to real-time semantic segmentation tasks such as autonomous driving cars, scene recognition, and robot navigation systems. In order to maintain fast inference speed, most of the existing networks for real-time semantic segmentation make use of light decoders, or they simply do not use them at all. This strategy helps to maintain their fast inference speed; however, their accuracy performance is significantly lower in comparison to non-real-time semantic segmentation networks. In order to reduce the accuracy gap between real-time and non-real-time segmentation networks, we introduce two key modules aimed to improve the decoder of a high-performance baseline network for real-time semantic segmentation. Our first module, Dilated Asymmetric Pyramidal Fusion (DAPF), is designed to substantially increase the receptive field on the top of the last stage of the encoder, obtaining richer contextual features. Our second module, Multi-resolution Dilated Asymmetric (MDA) module, fuses multi-resolution feature maps coming from both, early and deeper stages of the network, simultaneously refining detail and contextual information in multiple stages of the decoder. Both modules exploit contextual information without excessively increasing the computational complexity of the network. Our proposed network entitled “FASSDNet” reaches 78.8% of mIoU accuracy on Cityscapes validation dataset at 41.1 FPS on full resolution images (1024x2048). In addition, with a lite version of our network, we reach 74.1% of mIoU at 133.1 FPS (full resolution) on a single NVIDIA GTX 1080Ti card with no additional acceleration techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call