Abstract

Most of the current visual odometry (VO) methods are designed based on a set of standard procedures, including camera calibration, feature extraction, feature matching (or tracking), motion estimation, local optimization, etc. Though some of these methods excel in accuracy and robustness, they usually require careful design and specific fine-tuning to ensure performance in different scenarios. Also, recovering the absolute scale of the monocular VO usually requires some prior knowledge. In this paper, we propose an end-to-end framework based on U-Net and deep recurrent neural networks (RNNs) for training and deployment, which directly estimates poses from a sequence of raw RGB images without using any modules of the conventional VO system and thus without fine-tuning the parameters of the VO system. Also, no prior knowledge of the scene is necessary in our method. To solve the problem that current deep learning methods only predict poses between frames, we first use U-Net to automatically learn effective feature representations for VO, and then utilize RNN to implicitly model sequence dynamics and relationships. This method makes full use of the context information of sequence frames to achieve accurate and robust VO localization. Experimental results show the superiority of our method compared with several widely used VO methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call