Abstract

Visual SLAM (Simultaneous Localization and Mapping) has been widely used in location and path planning of unmanned systems. However, the map created by visual SLAM system only contain low-level information. The unmanned system can work better if high-level semantic information is included. In this paper, we proposed a visual semantic SLAM method using DCNN (Deep Convolution Neural Network). The network is composed of feature extraction, multi-scale process and classification layers. We apply atrous convolution to GoogLeNet for feature extraction to increase the speed of network and to increase the resolution of the feature map. Spatial pyramid pooling is used in multi-scale process and Softmax is used in classification layers. The results reveals that the mIoU of our network on PASCAL 2012 is 0.658 and it takes 101 ms to infer an image with the size of 256 × 212 on NVIDIA Jetson TX2 embedded module, which can be used in real-time visual SLAM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call