Abstract

The precise evaluation of camera position and orientation is a momentous procedure of most machine vision tasks, especially visual localization. Aiming at the shortcomings of local features of dealing with changing scenes and the problem of realizing a robust end-to-end network that worked from feature detection to matching, an invariant local feature matching method for changing scene image pairs is proposed, which is a network that integrates feature detection, descriptor constitution, and feature matching. In the feature point detection and descriptor construction stage, joint training is carried out based on a neural network. In the feature point extraction and descriptor construction stage, joint training is carried out based on a neural network. To obtain local features with solid robustness to viewpoint and illumination changes, the Vector of Locally Aggregated Descriptors based on Neural Network (NetVLAD) module is introduced to compute the degree of correlation of description vectors from one image to another counterpart. Then, to enhance the relationship between relevant local features of image pairs, the attentional graph neural network (AGNN) is introduced, and the Sinkhorn algorithm is used to match them; finally, the local feature matching results between image pairs are output. The experimental results show that, compared with the existed algorithms, the proposed method enhances the robustness of local features of varying sights, performs better in terms of homography estimation, matching precision, and recall, and when meeting the requirements of the visual localization system to the environment, the end-to-end network tasks can be realized.

Highlights

  • The excellent matching performance between local features of changing scene images can ensure the stability of adjacent frame matching in visual localization

  • Standard traditional and improved feature extraction methods [2,3,4,5] have been widely used in the visual localization system, the Markov model [6] has been widely used in the traditional route planning issues

  • Speed-Up Robust Features (SURF) [4] improves the Scaleinvariant feature transform (SIFT) algorithm, converts the input image into an integral image, transforms the image with Hessian matrix, and extracts local feature points through nonmaximum suppression to solve the problems of high computational complexity and time-costs of SIFT

Read more

Summary

Introduction

The excellent matching performance between local features of changing scene images can ensure the stability of adjacent frame matching in visual localization. Scaleinvariant feature transform (SIFT) [2] constructs the Gaussian pyramid of the input image, queries the extreme points, and determines the feature points’ position. Speed-Up Robust Features (SURF) [4] improves the SIFT algorithm, converts the input image into an integral image, transforms the image with Hessian matrix, and extracts local feature points through nonmaximum suppression to solve the problems of high computational complexity and time-costs of SIFT. Even if conventional feature detection ways, such as SIFT, SURF, and other algorithms, have been widely used in the field of machine vision, they cannot reflect semantic features and lack strong robustness to scene viewpoint, scale, and weather illumination changes. Computers benefit from the introduction of deep learning and can obtain robust models against changes in the natural environment under the training of large amounts of data

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call