Abstract

Detecting similarities between image patches and measuring their mutual displacement are important parts in the registration of multimodal remote sensing (RS) images. Deep learning approaches advance the discriminative power of learned similarity measures (SM). However, their ability to find the best spatial alignment of the compared patches is often ignored. We propose to unify the patch discrimination and localization problems by assuming that the more accurately two patches can be aligned, the more similar they are. The uncertainty or confidence in the localization of a patch pair serves as a similarity measure of these patches. We train a two-channel patch matching convolutional neural network (CNN), called DLSM, to solve a regression problem with uncertainty. This CNN inputs two multimodal patches, and outputs a prediction of the translation vector between the input patches as well as the uncertainty of this prediction in the form of an error covariance matrix of the translation vector. The proposed patch matching CNN predicts a normal two-dimensional distribution of the translation vector rather than a simple value of it. The determinant of the covariance matrix is used as a measure of uncertainty in the matching of patches and also as a measure of similarity between patches. For training, we used the Siamese architecture with three towers. During training, the input of two towers is the same pair of multimodal patches but shifted by a random translation; the last tower is fed by a pair of dissimilar patches. Experiments performed on a large base of real RS images show that the proposed DLSM has both a higher discriminative power and a more precise localization compared to existing hand-crafted SMs and SMs trained with conventional losses. Unlike existing SMs, DLSM correctly predicts translation error distribution ellipse for different modalities, noise level, isotropic, and anisotropic structures.

Highlights

  • A popular strategy for solving the image registration problem involves two steps: finding a set of putative correspondences (PC) between patches of registered images and estimating geometrical transform parameters between these images on basis of the found PCs [1,2]

  • Both the two-stream convolutional neural network (CNN)-based similarity measures (SM) trained with different loss functions and Siamese CNN-based Deep Localization Similarity Measure (DLSM) are compared to five existing multimodal SMs: (1) an SM which includes two terms, the Mutual Information and a gradient term, which highlights the large gradients with orientations in both modalities (GMI, Gradient with Mutual Information) [55]; (2) Scale-Invariant Feature Transform (SIFT)-OCT [15]; (3) Modality Independent Neighborhood Descriptor (MIND) [16]; (4) Histogram of Orientated Phase Congruency (HOPC) [13]; and (5) L2-Net descriptor CNN [27]

  • We have proposed a new CNN structure for training a multimodal similarity measure that satisfies two properties: a high discriminative power and accurate localization of the compared patches

Read more

Summary

Introduction

A popular strategy for solving the image registration problem involves two steps: finding a set of putative correspondences (PC) between patches of registered images and estimating geometrical transform parameters between these images on basis of the found PCs [1,2]. Our contribution to the patch matching problem is a novel convolutional neural network (CNN), called Deep Localization Similarity Measure (DLSM) It is designed for improving both discrimination power and localization accuracy compared to existing hand-crafted and learned SMs. Patch discrimination and localization are not addressed as different problems, but rather as two aspects of the same problem. An important feature, which is lacking in existing SMs, is that the localization accuracy is predicted for each pair of patches, including isotropic and anisotropic textures, patches with low and high SNR, and patches with different modalities in the form of an error covariance matrix This value can be used advantageously to set a proper PC weighting during multimodal image registration [1,2,26].

Overview of Existing Patch Matching CNN Structure and Loss Functions
Discrimination and Localization Ability of Existing Patch Matching CNNs
Requirements to Complexity of Geometrical Transform Between Patches in RS
SM Performance Criteria
Patch Matching as Deep Regression with Uncertainty
Siamese ConvNet Structure and Training Process Settings
Patch Pair Alignment with Subpixel Accuracy
Experimental Part
Multimodal Image Dataset
Discriminative Power Analysis
Method
Patch Matching Uncertainty Analysis
Localization Accuracy Analysis
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call