Abstract

The tradeoff between speed and accuracy is important in semantic segmentation problems, especially for resource-constrained platforms, such as intelligent vehicles. In this paper, we address this issue by proposing a well-deployed real-time semantic segmentation architecture named MLFNet. Specifically, we first build a lightweight backbone (SEFE) with a larger receptive field and multi-scale contextual representation performance to encode the pixel-level features. For better preserving target boundaries and contours, a spatial compensation branch (SPFE) is designed to gradually reduce the dimension of feature maps and refine the low-level specifics. In the decoding phase, we introduce a well-designed multi-branch fusion extractor (MBFD) for integrating the spatial details into high-level layers. Finally, the outputs from the semantic and spatial branches are fused to predict the final segmentation results. Extensive offline and online experiments have shown that our model has a superior speed and accuracy trade-off. On the Cityscapes test dataset, our model (MLFNet-Res18) achieves 71.0% mIoU with 95.1 FPS for 512 × 1024 inputs, and 72.1% mIoU with 72.2 FPS while inferring on MLFNet-Res34 model. Meanwhile, MLFNet-Res18 can reach 24.5 FPS when deployed to NVIDIA Jetson AGX Xavier and 64.0 FPS with an experimental vehicle.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.