Deep Multimodal Data Fusion

Fei Zhao,Baocheng Geng,Chengcui Zhang

doi:10.1145/3649447

Abstract

Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data (e.g., images, texts, or data collected from different sensors), feature engineering (e.g., extraction, combination/fusion), and decision-making (e.g., majority vote). As architectures become more and more sophisticated, multimodal neural networks can integrate feature extraction, feature fusion, and decision-making processes into one single model. The boundaries between those processes are increasingly blurred. The conventional multimodal data fusion taxonomy (e.g., early/late fusion), based on which the fusion occurs in, is no longer suitable for the modern deep learning era. Therefore, based on the main-stream techniques used, we propose a new fine-grained taxonomy grouping the state-of-the-art (SOTA) models into five classes: Encoder-Decoder methods, Attention Mechanism methods, Graph Neural Network methods, Generative Neural Network methods, and other Constraint-based methods. Most existing surveys on multimodal data fusion are only focused on one specific task with a combination of two specific modalities. Unlike those, this survey covers a broader combination of modalities, including Vision + Language (e.g., videos, texts), Vision + Sensors (e.g., images, LiDAR), and so on, and their corresponding tasks (e.g., video captioning, object detection). Moreover, a comparison among these methods is provided, as well as challenges and future directions in this area.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM Computing Surveys	Publication Date: Apr 24, 2024
Citations: 5	License type: other-oa

R Discovery Prime

R Discovery Prime

Deep Multimodal Data Fusion

Abstract

Talk to us

Similar Papers

More From: ACM Computing Surveys

Lead the way for us

Similar Papers

Multimodal Data Fusion of Deep Learning and Dynamic Functional Connectivity Features to Predict Alzheimer's Disease Progression.
Anees Abrol ... Zening Fu
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference | VOL. 2019
Anees Abrol, et. al.Anees Abrol ... Zening Fu
01 Jul 2019
01 Jul 2019

Multimodal Fusion of Brain Imaging Data: A Key to Finding the Missing Link(s) in Complex Mental Illness
Vince D Calhoun ... Jing Sui
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging | VOL. 1
Vince D Calhoun, et. al.Vince D Calhoun ... Jing Sui
07 Jan 2016
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging | VOL. 1

Utilisation of Deep Learning with Multimodal Data Fusion for Determination of Pineapple Quality Using Thermal Imaging
Maimunah Mohd Ali ... Ola Lasekan
Agronomy | VOL. 13
Maimunah Mohd Ali, et. al.Maimunah Mohd Ali ... Ola Lasekan
30 Jan 2023
Agronomy | VOL. 13

A survey of multimodal information fusion for smart healthcare: Mapping the journey from data to wisdom
Thanveer Shaik ... Juan D Velásquez
Information Fusion | VOL. 102
Thanveer Shaik, et. al.Thanveer Shaik ... Juan D Velásquez
28 Sep 2023
Information Fusion | VOL. 102

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Multimodal Data Fusion

Abstract

Talk to us

Similar Papers

More From: ACM Computing Surveys