Abstract

In near real-time photogrammetry, the first step in processing each new added image is determining the most relevant image in pre-sequence unordered images quickly and exactly, which is pivotal for accurate image matching and 3D reconstruction. This paper presents a hierarchical image retrieval algorithm based on multiple features and details the choice for representation of multiple features which is critical to the improvement of accuracy of this algorithm. First, we represent global features using AlexNet-FC7(fully connected layers) or ResNet101-Pool5(pooling layers) and local features using SIFT (scale-invariant feature transform) in two parallel threads with support of GPU (Graphics Processing Unit). Next, we obtain candidates based on cosine similarities between global features of each pre-sequence image and new added image. Finally, we determine the most relevant image from those candidates according to feature matching results for each candidate and new added image. The experimental results confirm that the second step is rather fast and the third step is necessary to tackle the problem that global features cannot distinguish objects from the same class. The total time our algorithm takes is about 83.6ms for determining the most relevant image in 5063 pre-sequence unordered images of size $1024\times 768$ , which outperforms exhaustive pairwise matching, Bag of Words and multi-vocabulary trees. Accuracy of our algorithm also perform better than the state-of-the-art methods on three benchmark datasets. SIFT matching results obtained in the third step after eliminating mismatches with RANSAC (Random Sample Consensus) can also be used for high-precision incremental SFM (Structure from Motion) reconstruction.

Highlights

  • For large scale real-time photogrammetry, we need to process each image transmitted to PC synchronously in real time

  • Since the 1990s, there are two kinds of retrieval methods—those that are based on local features, such as SIFT, and based on global features, such as Convolutional Neural Networks (CNN)—in order to achieve accurate and quick correlated image retrieval [1] at a large scale

  • EXPERIMENTAL RESULTS AND ANALYSIS Some instance-level datasets that have been commonly used in the field of image retrieval were adopted in the experiment, including the Holidays Dataset (1491 images composed of 500 groups of similar images) [23], the Oxford Buildings Dataset (5063 images collected by crawling images from Flickr using the names of 11 different landmarks in Oxford) [24] and the Oxford Paris [25]

Read more

Summary

Introduction

For large scale real-time photogrammetry, we need to process each image transmitted to PC synchronously in real time. The first step is to quickly and accurately determine the most relevant images from large amounts of pre-sequence images for each image transmitted to PC after rectification without geographic location information. The quality of this has a direct impact on the subsequent matching and stereo model reconstruction. As photogrammetry and computer vision technology gain steam, content-based image retrieval technology can be adopted in this step. CBIR (content-based image retrieval) is the process of searching for the images. Since the 1990s, there are two kinds of retrieval methods—those that are based on local features, such as SIFT (scale-invariant feature transform), and based on global features, such as Convolutional Neural Networks (CNN)—in order to achieve accurate and quick correlated image retrieval [1] at a large scale

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.