Abstract

Abstract. The extraction of reliable and repeatable interest points among images is a fundamental step for automatic image orientation (Structure-From-Motion). Despite recent progresses, open issues in challenging conditions - such as wide baselines and strong light variations - are still present. Over the years, traditional hand-crafted methods have been paired by learning-based approaches, progressively updating the state-of-the-art according to recent benchmarks. Notwithstanding these advancements, learning-based methods are often not suitable for real photogrammetric surveys due to their lack of rotation invariance, a fundamental requirement for these specific applications. This paper proposes a novel hybrid image matching pipeline which employs both hand-crafted and deep-based components, to extract reliable rotational invariant keypoints optimized for wide-baseline scenarios. The proposed hybrid pipeline was compared with other hand-crafted and learning-based state-of-the-art approaches on some photogrammetric datasets using metric ground-truth data. Results show that the proposed hybrid matching pipeline has high accuracy and appeared to be the only method among the evaluated ones able to register images in the most challenging wide-baseline scenarios.

Highlights

  • Photogrammetry has become a valuable, powerful, automated and cheap alternative to active sensors for the generation of textured 3D models (Remondino et al, 2017)

  • This paper proposes a novel hybrid image matching pipeline which employs both hand-crafted and deep-based components, to extract reliable rotational invariant keypoints optimized for wide-baseline scenarios

  • All compared methods for the orientation of the Ventimiglia Theatre Nadiral dataset reached similar Root Mean Square Error (RMSE) (see Fig. 6(a)) and were able to orient all images, with the exception of SuperPoint (see the RI value reported together with other Bundle Adjustment (BA) statistics in Fig. 6(b); notice that Metashape root mean square reprojection errors are reported as gray histogram bars instead of the Mean Reprojection Error (MRE) ones in the figures), which failed to orient the whole dataset

Read more

Summary

INTRODUCTION

Photogrammetry has become a valuable, powerful, automated and cheap alternative to active sensors for the generation of textured 3D models (Remondino et al, 2017). While the first attempts in this research direction focused on the different steps of the image matching pipeline separately, more recent solutions provide end-to-end deep networks that jointly optimize the whole pipeline steps: LIFT (Yi et al, 2016), LF-Net (Ono et al, 2018), SuperPoint (DeTone et al, 2018), R2D2 (Revaud et al, 2019), D2-Net (Dusmanu et al, 2019), ASLFeat (Luo et al, 2020), etc This last design choice increases both the keypoint repeatability and reliability and, the image matching success rate, proving beneficial for the final pose estimation accuracy. Current end-to-end deep architectures can be not suitable for general-purpose photogrammetric applications due to their limitation in handling large image rotations (Remondino et al, 2021) This specific design choice is made to maximize the discriminative ability of the matching process in more common general-user application scenarios with all images roughly up-. This last solution is not completely satisfactory as synthetic scenes are generally unable to fully simulate real world scenarios

Aim of the paper
Dataset
Evaluation setup
Results
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call