Abstract

Estimating a 2D homography from a pair of images is a fundamental task in computer vision. Contrary to most convolutional neural network-based homography estimation methods that use alternative four-point homography parameterization schemes, in this study, we directly estimate the 3 × 3 homography matrix value. We show that after coordinate normalization, the magnitude difference and variance of the elements of the normalized 3 × 3 homography matrix is very small. Accordingly, we present STN-Homography, a neural network based on spatial transformer network (STN), to directly estimate the normalized homography matrix of an image pair. To decrease the homography estimation error, we propose hierarchical STN-Homography and sequence STN-homography models in which the sequence STN-Homography can be trained in an end-to-end manner. The effectiveness of the proposed methods is demonstrated based on experiments on the Microsoft common objects in context (MSCOCO) dataset, and it is shown that they significantly outperform the current state-of-the-art. The average processing time of the three-stage hierarchical STN-Homography and the three-stage sequence STN-Homography models on a GPU are 17.85 ms and 13.85 ms, respectively. Both models satisfy the real-time processing requirements of most potential applications.

Highlights

  • Estimating a 2D homography from a pair of images is a fundamental task in computer vision

  • We found that the performance of the sequence spatial transformer network (STN)-Homography model was better than that of the hierarchical

  • The mean corner error of the single STN-Homography model was 4.85 pixels, which was smaller than the state-of-the-art, one-stage, Convolutional neural networks (CNNs)-based four-point homography estimation methods

Read more

Summary

Introduction

Estimating a 2D homography (or projective transformation) from a pair of images is a fundamental task in computer vision. In view of the powerful feature extraction and matching capabilities of the CNNs, some studies focused on the solution of homography estimation using CNN and achieved higher accuracies compared with the ORB+RANSAC method. HomographyNet [22] estimated the homography between two images based on the relocation of a set of four points, known as four-point homography parameterization This model is based on the VGG architecture [23] with eight convolutional layers, a pooling layer after every two convolutions, two fully connected layers and an L2 loss function that results from the difference between the predicted and the true four-point coordinate values. Nguyen et al [25] proposed an unsupervised learning algorithm that trained a deep CNN to estimate planar homography which was based on four-point homography parameterization All these studies chose the four-point homography parameterization because the 3 × 3 homography matrix H mixes the rotation, translation, scaling, and shear components of the homography transformation. The contributions of this study are as follows: (1) We prove that the 3 × 3 homography matrix can be directly learnt well with the proposed STN-Homography model after pixel coordinate normalization, rather than to estimate the alternative four-point homography. (2) We propose a hierarchical STN-Homography model that yields more accurate results compared with the state-of-the-art. (3) We propose a sequence STN-Homography model that can be trained in an end-to-end manner and yield superior results than those obtained by the hierarchical STN-Homography model and the state-of-the-art

Dataset
Architecture of STN-Homography
Training and Results
Comparison with Other Approaches
Architecture of Hierarchical STN-Homography
Training, Results and Comparison with Other Approaches
Time Consumption and Predicted Results
Sequence STN-Homography
Architecture of Sequence STN-Homography
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call