Abstract

This paper is devoted to the problem of image semantic segmentation for machine vision system of off-road autonomous robotic vehicle. Most modern convolutional neural networks require large computing resources that go beyond the capabilities of many robotic platforms. Therefore, the main drawback of such models is extremely high complexity of the convolutional neural network used, whereas tasks in real applications must be performed on devices with limited resources in real-time. This paper focuses on the practical application of modern lightweight architectures as applied to the task of semantic segmentation on mobile robotic systems. The article discusses backbones based on ResNet18, ResNet34, MobileNetV2, ShuffleNetV2, EfficientNet-B0 and decoders based on U-Net, DeepLabV3 and DeepLabV3+ as well as additional components that can increase the accuracy of segmentation and reduce the inference time. In this paper we propose a model using ResNet34 enconding and DeepLabV3+ decoding with Squeeze & Excitation blocks that was optimal in terms of inference time and accuracy. We also demonstrate our off-road dataset and simulated dataset for semantic segmentation. Furthermore, we increased mIoU metric by 2.6 % on our off-road dataset using pretrained weights on simulated dataset, compared with mIoU metric using pretrained weights on the Cityscapes. Moreover, we achieved 76.1 % mIoU on the Cityscapes validation set and 85.4 % mIoU on our off-road validation set at 37 FPS (Frames per Second) for an input image of 1024×1024 size on one NVIDIA GeForce RTX 2080 card using NVIDIA TensorRT inference framework.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call