A comparative review on multi-modal sensors fusion based on deep learning

Qin Tang,Jing Liang,Fangqi Zhu

doi:10.1016/j.sigpro.2023.109165

Abstract

The wide deployment of multi-modal sensors in various areas generates vast amounts of data with characteristics of high volume, wide variety, and high integrity. However, traditional data fusion methods face immense challenges when dealing with multi-modal data containing abundant intermodality and cross-modality information. Deep learning has the ability to automatically extract and understand the potential association of multi-modal information. Despite this, there is a lack of a comprehensive review of the inherent inference mechanisms of deep learning for multi-modal sensor fusion. This work investigates up-to-date developments in multi-modal sensor fusion via deep learning to provide a broad picture of data fusion needs and technologies. It compares the characteristics of multi-modal data for various sensors, summarizes background concepts about data fusion and deep learning, and carefully reviews a large number of investigations in four inference mechanisms: adaptive learning, deep generative, deep discriminative, and algorithms unrolling. The pros and cons of the above methodologies are presented, and several popular application domains are discussed, including medical imaging, autonomous driving, remote sensing, and robotics. A large collection of multi-modal datasets published in recent years is presented, and several tables that quantitatively compare and summarize the performance of fusion algorithms are provided. Finally, by acknowledging the limitations of current research, we establish potential open challenges and future directions as guidance for deep learning-based multi-sensor fusion.

Full Text