Abstract

Multimodal image fusion combines information from multiple modalities to generate a composite image containing complementary information. Multimodal image fusion is challenging due to the heterogeneous nature of data, misalignment and nonlinear relationships between input data, or incomplete data during the fusion process. In recent years, several attention mechanisms have been introduced to enhance the performance of deep learning models. However, little literature is available on multimodal image fusion using attention mechanisms. This paper aims to study and analyze the latest deep-learning approaches, including attention mechanisms for multimodal image fusion. As a result of this study, the graphical taxonomy based on the different image modalities, various fusion strategies, fusion levels, and metrics for fusion tasks has been put forth. The focus has been on various Multimodal image fusion frameworks based on deep-learning techniques as their core methodology. This paper also sheds light on the challenges and future research directions in this field, application domains, and benchmark datasets used for multimodal fusion tasks. This paper contributes to the research on Multimodal image fusion and can help researchers select a suitable methodology for their applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call