Local feature plays a pivotal role in various robotic tasks, including 3D reconstruction and visual localization. Although deep learning-based local features have proven superior to their traditional counterparts, they still face challenges in practical applications due to matching failures. These challenges primarily stem from inaccuracies in keypoint localization and the limited robustness of descriptors particularly with significant appearance changes. In this study, we introduce a novel method for local feature learning based on layer spatial attention and domain generalization. Firstly, a keypoint extraction strategy driven by layer spatial attention from high-level feature is proposed to enhance the accuracy of keypoint localization, progressing from coarse to fine granularity. Secondly, a new learning paradigm based on domain generalization is developed to extract local features with resilience against variations in illumination. To enrich the domain diversity within the training dataset, a real-time and lossless Fourier transform-based domain augmentation method is introduced. This method seamlessly integrates into the training process, enhancing the model's adaptability to varying domains. Additionally, explicit feature alignment-based representation learning is performed, further reinforcing the extraction of domain-invariant local features. Experimental results on public datasets demonstrate that the proposed method achieves state-of-the-art performance across various downstream tasks reliant on local feature matching, such as image matching, 3D reconstruction, and long-term visual localization.
Read full abstract