A survey of multimodal machine learning

Peng Chen ,Qing Li ,Yuhang Yang ,Zheng Cai ,Dezheng Zhang ,Zhigang Lü

doi:10.13374/j.issn2095-9389.2019.03.21.003

Abstract

“Big data” is always collected from different resources that have different data structures. With the rapid development of information technologies, current precious data resources are characteristic of multimodes. As a result, based on classical machine learning strategies, multi-modal learning has become a valuable research topic, enabling computers to process and understand “big data”. The cognitive processes of humans involve perception through different sense organs. Signals from eyes, ears, the nose, and hands (tactile sense) constitute a person’s understanding of a special scene or the world as a whole. It reasonable to believe that multi-modal methods involving a higher ability to process complex heterogeneous data can further promote the progress of information technologies. The concepts of multimodality stemmed from psychology and pedagogy from hundreds of years ago and have been popular in computer science during the past decade. In contrast to the concept of “media”, a “mode” is a more fine-grained concept that is associated with a typical data source or data form. The effective utilization of multi-modal data can aid a computer understand a specific environment in a more holistic way. In this context, we first introduced the definition and main tasks of multi-modal learning. Based on this information, the mechanism and origin of multi-modal machine learning were then briefly introduced. Subsequently, statistical learning methods and deep learning methods for multi-modal tasks were comprehensively summarized. We also introduced the main styles of data fusion in multi-modal perception tasks, including feature representation, shared mapping, and co-training. Additionally, novel adversarial learning strategies for cross-modal matching or generation were reviewed. The main methods for multi-modal learning were outlined in this paper with a focus on future research issues in this field.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A survey of multimodal machine learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Analysis of Multimodal Data Using Deep Learning and Machine Learning
Swetha Reddy Thodupunori
Asian Journal of Humanity, Art and Literature | VOL. 4
Swetha Reddy ThodupunoriSwetha Reddy Thodupunori
31 Dec 2018
Asian Journal of Humanity, Art and Literature | VOL. 4

Multi-modal deep learning for automated assembly of periapical radiographs
L Pfänder ... F Schwendicke
Journal of Dentistry | VOL. 135
L Pfänder, et. al.L Pfänder ... F Schwendicke
21 Jun 2023
Journal of Dentistry | VOL. 135

Vision + X: A Survey on Multimodal Learning in the Light of Data.
Ye Zhu ... Nicu Sebe
IEEE transactions on pattern analysis and machine intelligence | VOL. 46
Ye Zhu, et. al.Ye Zhu ... Nicu Sebe
01 Dec 2024
IEEE transactions on pattern analysis and machine intelligence | VOL. 46

Bio-Plausible Multimodal Learning with Emerging Neuromorphic Devices.
Haonan Sun ... Tao Zhou
Advanced science (Weinheim, Baden-Wurttemberg, Germany) | VOL. -
Haonan Sun, et. al.Haonan Sun ... Tao Zhou
11 Sep 2024
Advanced science (Weinheim, Baden-Wurttemberg, Germany) | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A survey of multimodal machine learning

Abstract

Talk to us

Similar Papers