The precise evaluation of camera position and orientation is a momentous procedure of most machine vision tasks, especially visual localization. Aiming at the shortcomings of local features of dealing with changing scenes and the problem of realizing a robust end‐to‐end network that worked from feature detection to matching, an invariant local feature matching method for changing scene image pairs is proposed, which is a network that integrates feature detection, descriptor constitution, and feature matching. In the feature point detection and descriptor construction stage, joint training is carried out based on a neural network. In the feature point extraction and descriptor construction stage, joint training is carried out based on a neural network. To obtain local features with solid robustness to viewpoint and illumination changes, the Vector of Locally Aggregated Descriptors based on Neural Network (NetVLAD) module is introduced to compute the degree of correlation of description vectors from one image to another counterpart. Then, to enhance the relationship between relevant local features of image pairs, the attentional graph neural network (AGNN) is introduced, and the Sinkhorn algorithm is used to match them; finally, the local feature matching results between image pairs are output. The experimental results show that, compared with the existed algorithms, the proposed method enhances the robustness of local features of varying sights, performs better in terms of homography estimation, matching precision, and recall, and when meeting the requirements of the visual localization system to the environment, the end‐to‐end network tasks can be realized.
Read full abstract