Abstract

Structure from motion (SfM) recovers scene structures and camera poses based on feature matching, and faces challenges from ambiguous scenes. There are a large number of ambiguous scenes in real environment, which contain many duplicate structures and textures. The ambiguity leads to incorrect feature matches between images with similar appearance, and makes geometric misalignment in SfM. To address this problem, recent methods have focused on investigating the inconsistencies in feature topology among multi-view images. However, the feature topology is directly derived from 2D images. Thus, it is susceptible to feature occlusion caused by changes in perspective. Therefore, we propose a new method that disambiguates scenes using pose consistency rather than feature consistency. The pose consistency is conducted in 3D geometric space which is less sensitive to feature occlusion. Thus, the pose consistency is more robust than feature consistency. Our core motivation lies that the incorrect matches between ambiguous images will cause pose deviation from the global poses generated by correct matches. To detect this pose deviation, we first combine local and global information of the scene to generate the global reliable camera poses. The local information of each image is obtained by image clustering, and it strengthens the global information that is represented as the verified maximum spanning tree of clusters. Then, the global poses serve as the reference for further pose consistency verification. The global poses also enable us to perform both rotation and translation consistency verification for uncertain matches. During the pose consistency verification, the pose deviation calculated on image-level may be too small to be noticed. Thus, we propose to perform pose consistency verification at cluster-level instead of image-level to amplify the pose deviation. In the experiments, we compared our approach with several state-of-the-art methods, including COLMAP, Geodesic-SfM and TC-SfM, on both ambiguous and regular datasets. The results demonstrate that our approach achieves the best robustness, only our approach succeeds on all ambiguous image sequences (14/14). The quantitative evaluation results on image sequences with ground truth also show that our approach achieves the best accuracy (average RMSE of translation = 0.109, average RMSE of rotation = 0.827) among all methods. The source code of our approach is publicly available at https://github.com/gongyeted/MA-SfM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call