Infrared and visible remote sensing image registration is significant for utilizing remote sensing images to obtain scene information. However, it is difficult to establish a large number of correct matches due to the difficulty in obtaining similarity metrics due to the presence of radiation variation between heterogeneous sensors, which is caused by different imaging principles. In addition, the existence of sparse textures in infrared images as well as in some scenes and the small number of relevant trainable datasets also hinder the development of this field. Therefore, we combined data-driven and knowledge-driven methods to propose a Radiation-variation Insensitive, Zero-shot learning-based Registration (RIZER). First, RIZER, as a whole, adopts a detector-free coarse-to-fine registration framework, and the data-driven methods use a Transformer based on zero-shot learning. Next, the knowledge-driven methods are embodied in the coarse-level matches, where we adopt the strategy of seeking reliability by introducing the HNSW algorithm and employing a priori knowledge of local geometric soft constraints. Then, we simulate the matching strategy of the human eye to transform the matching problem into a model-fitting problem and employ a multi-constrained incremental matching approach. Finally, after fine-level coordinate fine tuning, we propose an outlier culling algorithm that only requires very few iterations. Meanwhile, we propose a multi-scene infrared and visible remote sensing image registration dataset. After testing, RIZER achieved a correct matching rate of 99.55% with an RMSE of 1.36 and had an advantage in the number of correct matches, as well as a good generalization ability for other multimodal images, achieving the best results when compared to some traditional and state-of-the-art multimodal registration algorithms.
Read full abstract