Abstract

For AR applications, the present monocular VO methods can be further improved to provide more real-time and accurate self-localization in a dynamic environment with motion disturbance. This paper proposes the CCVO (Cascaded CNNs for Visual Odometry) which is a monocular VO approach to realize end-to-end pose estimation based on two cascaded CNNs. The first CNN detects trackable feature points and conducts semantic segmentation concurrently in milliseconds. The feature points belonging to the dynamic objects are removed as outliers to reduce their interference effect on the pose estimation. The second CNN takes the static feature points of two consecutive images as inputs and predicts the transformation matrix in true scale. Our experiment shows that the CCVO has a better real-time performance as well as relatively satisfactory positioning accuracy and generalization ability when compared with traditional and DL (Deep Learning)-based VO methods. The results of geometry consistency check and forward-backward consistency also show its potential as an effective front-end solution of vSLAM for AR applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.