Abstract

Deep learning-based visual odometry (VO) has recently attracted much attention. In performing computer vision tasks, deep models are often more robust than manually extracted features. Deep models, however, have limited reliability and generalization ability which constrain their application in the VO systems. The existing approaches to address these issues are often based on creating more realistic datasets or employing strategies such as unsupervised learning and online fine-tuning. In contrast to the previous research, here we tackle these issues from the model structure standpoint. We present a simple yet robust VO (namely SF-VO) based on an especially designed sparse optical flow network. We then show that this network becomes suitable for VO applications by decomposing the global optimization problem into a single-point optimization. Combining the network with a consistency verification module and a Perspective-n-Point (PnP) solver, we then form a frame-to-frame VO system using the traditional pose estimation pipeline. Extensive experiments show that the proposed VO system effectively generalizes to real scenes while only synthetic datasets are used in the training process. It is also shown that the proposed model also outperforms other deep learning-based methods with a model size of only 1.69 M. Comparisons with the state-of-art optical flow models and performing expansion experiments further confirm that the designed network demonstrates a higher level of generalization ability and is capable of being trained based on limited datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call