Modern UAVs (unmanned aerial vehicles) equipped with video cameras can provide large-scale high-resolution video data. This poses significant challenges for structure from motion (SfM) and simultaneous localization and mapping (SLAM) algorithms, as most of them are developed for relatively small-scale and low-resolution scenes. In this paper, we present a video-based SfM method specifically designed for high-resolution large-size UAV videos. Despite the wide range of applications for SfM, performing mainstream SfM methods on such videos poses challenges due to their high computational cost. Our method consists of three main steps. Firstly, we employ a visual SLAM (VSLAM) system to efficiently extract keyframes, keypoints, initial camera poses, and sparse structures from downsampled videos. Next, we propose a novel two-step keypoint adjustment method. Instead of matching new points in the original videos, our method effectively and efficiently adjusts the existing keypoints at the original scale. Finally, we refine the poses and structures using a rotation-averaging constrained global bundle adjustment (BA) technique, incorporating the adjusted keypoints. To enrich the resources available for SLAM or SfM studies, we provide a large-size (3840 × 2160) outdoor video dataset with millimeter-level-accuracy ground control points, which supplements the current relatively low-resolution video datasets. Experiments demonstrate that, compared with other SLAM or SfM methods, our method achieves an average efficiency improvement of 100% on our collected dataset and 45% on the EuRoc dataset. Our method also demonstrates superior localization accuracy when compared with state-of-the-art SLAM or SfM methods.
Read full abstract