Real-time and high-performance segmentation is a crucial but challenging perception task for computationally constrained robots, such as the humanoid NAO robot used in the RoboCup Soccer Standard Platform League. However, most existing convolutional neural network (CNN)-based models for semantic segmentation suffer from massive computational costs, which prevents them from being applied to performing real-time inference with a NAO. In this article, we first publish meticulously annotated datasets for training and evaluating semantic segmentation models. Then, we propose a fast downsampling module that downsamples the image while maintaining the spatial information and a novel dense learning module that learns high-level semantic information while recovering the spatial details. Based on these operations, by using a multiscale fusion method to recover the resolution, we propose a more efficient and real-time segmentation model called RoboSeg primarily aimed at offering better speed and accuracy tradeoffs. Finally, to accommodate practical engineering applications, we offer a promising deployment guideline for the CNN model describing how to deploy it on computational resource-limited robots and achieve real-time performance. The experimental results show that the RoboSeg exceeds the state-of-the-art networks in RoboCup scene segmentation: we attain a mean IoU of 87.35% and a pixel accuracy of 96.88% on our dataset using a model that contains only 0.29M parameters and performs just 0.73 GFLOPs. Under the proposed deployment strategies, the network can run at above 30 FPS on NAO robots with downsampled frames.
Read full abstract