Abstract Low-light visual perception problems, such as night Simultaneous Localization and Mapping (SLAM) or Structure-from-Motion (SfM) have attracted increasing attention, and the performance of keypoint detection and local feature description play a crucial role. Many traditional algorithms and machine learning methods have been widely used to detect and describe local features. However, the performance of the existing techniques in the face of highly low-light scenes will be drastically degraded, resulting in subsequent practical application needs can not be met. Therefore, an efficient self-supervised deep learning model DarkMatcher is proposed, which can directly detect and describe features from images in extremely dark environments in an end-to-end way. This model consists of a backbone based on new dynamic deformable convolutional blocks and a novel DarkMatcher module that combines multiple attention mechanisms to realize cross-scale feature information interaction. The former enhances the feature extraction capability of the model for low-light environment images. The latter effectively strengthens the matching ability of extremely dark environments and weak texture areas, and further improves the feature matching accuracy in low light scenes. In addition, transfer learning and real-time training strategies are used to enhance the generalization and feature representation capabilities of the model. Many experimental results indicate that DarkMatcher possesses the best matching performance and robustness for feature points in extremely dark environment, with an average matching accuracy of 71.24% and an average execution time of 51ms for each pair of images. Besides, the visual pose estimation experiment has also obtained good results as validation.
Read full abstract