Image keypoint detection and feature matching are fundamental steps in computer vision tasks. However, variations in environment, time, and viewpoint pose a challenge to the stability of image keypoint detection and matching. Most traditional and deep learning-based methods cannot accurately and efficiently extract highly repeatable keypoints and robust match pairs in low-luminance environments. Therefore, we propose a two-step ‘detection + matching’ framework, which consists of deep neural networks in each step. Firstly, we design a self-supervised robust keypoint detection network, which utilizes multi-scale, multi-angle, and multi-luminance transformation techniques to create pseudo-labeled datasets to improve the model’s keypoint detection repeatability and luminance invariance. Secondly, we propose a descriptor-free cross-fusion matching network, which uses the cross-fusion attention mechanism to establish connections between keypoint-centered image patches and converts the feature-matching task into an image patch assignment task to improve the accuracy and efficiency of matching. Thirdly, the proposed framework is used to replace traditional SIFT in SfM. Experimental results on testing datasets show that the self-supervised robust keypoint detection network achieves higher keypoint repeatability in low luminance environments compared to SIFT, ORB, LIFT, and Superpoint. The descriptor-free cross-fusion matching network’s mean matching accuracy and efficiency are higher than the mainstream Superglue algorithm. Also, SfM achieves better performance regarding the number of sparse point clouds and accuracy.
Read full abstract