Abstract

In this paper, we propose a novel method to precisely match two aerial images that were obtained in different environments via a two-stream deep network. By internally augmenting the target image, the network considers the two-stream with the three input images and reflects the additional augmented pair in the training. As a result, the training process of the deep network is regularized and the network becomes robust for the variance of aerial images. Furthermore, we introduce an ensemble method that is based on the bidirectional network, which is motivated by the isomorphic nature of the geometric transformation. We obtain two global transformation parameters without any additional network or parameters, which alleviate asymmetric matching results and enable significant improvement in performance by fusing two outcomes. For the experiment, we adopt aerial images from Google Earth and the International Society for Photogrammetry and Remote Sensing (ISPRS). To quantitatively assess our result, we apply the probability of correct keypoints (PCK) metric, which measures the degree of matching. The qualitative and quantitative results show the sizable gap of performance compared to the conventional methods for matching the aerial images. All code and our trained model, as well as the dataset are available online.

Highlights

  • IntroductionAerial image matching is a geometric process of aligning a source image with a target image

  • CNNGeo fine-tuned by aerial images shows somewhat tolerable performance, our method considerably outperforms this method in all cases of τ

  • We propose a novel approach based on a deep end-to-end network for aerial image matching

Read more

Summary

Introduction

Aerial image matching is a geometric process of aligning a source image with a target image Both images display the same scene but are obtained in different environments, such as time, viewpoints and sensors. It a prerequisite of a variety of aerial image tasks such as change detection, image fusion, and image stitching. In conventional computer vision approaches, correspondences between two images are computed by the hand-crafted algorithm (such as SIFT [1], SURF [2], HOG [3], and ASIFT [4]), followed by estimating the global geometric transformation using RANSAC [5] or Hough transform [6,7] These approaches are not very successful for aerial images due to their high-resolution, computational costs, large-scale transformation, and variation in the environments

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call