Abstract Traditional monocular Visual Odometry (VO) is typically based on the assumption of a static environment. However, it performs poorly in dynamic scenes, suffering from error accumulation and scale drift. To address these issues, a novel framework, named DyPanVO, was proposed, which incorporates multiple deep neural networks for feature point extraction, panoptic segmentation, and depth estimation. Firstly, learning-based methods are employed, specifically SuperPoint for robust feature extraction and LightGlue for accurate feature matching. Then, outlier points are eliminated by combining dynamic prior results from panoptic segmentation with epipolar geometry constraints to determine the motion state of objects. Additionally, the introduction of depth information ensures that scale is recovered and fundamentally solves the issue of error accumulation. Two types of experiment (outdoor and indoor scenes) are carried out to show the effectiveness of DyPanVO. Moreover, the comparison results demonstrate that it performs comparably with multi-view geometry-based and outperforms learning-based methods.
Read full abstract