Abstract
Multi-modal fusion of remote sensing images poses challenges because of the intricate imaging mechanisms and variations in radiation across different modalities. Specifically, the fusion of visible-light and vegetation-sensitive images encounters similar difficulties. Traditional methods have seldom considered the varied imaging mechanisms and radiation difference between modalities, resulting in discrepancies in the correspond features. To address the issue, we propose the VAM-Net (Vegetation-Attentive Multi-modal deep Network) combining a radiometric correction mechanism and a lightweight multi-modal adaptive feature selection method for fusing multi-modal images. First, the vegetation index (VDVI) is integrated into visible-light images to mitigate the radiometric differences between visible-light images and vegetation-sensitive images (e.g., infrared and red edge images). Then, a two-branch network incorporating attention mechanisms is designed to independently capture the texture features and select similar features cross two different modalities of images. Last, a new loss function is presented to ensure the learned features are suitable for multi-modal fusion. The VAM-Net is evaluated by visible-light and vegetation-sensitive images in three different areas, and the experimental results show that VAM-Net attains an average precision of 67.02%, and recall of 35.49%, and an average RMSE of 2.191px, demonstrating the accuracy and robustness of VAM-Net in multi-modal image fusion.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have