Multimodal Data Fusion with Quantum Inspiration

Qiuchi Li

doi:10.1145/3331184.3331419

Abstract

Language understanding is multimodal. During human communication, messages are conveyed not only by words in textual form, but also through speech patterns, gestures or facial emotions of the speakers. Therefore, it is crucial to fuse information from different modalities to achieve a joint comprehension. With the rapid progress in the deep learning field, neural networks have emerged as the most popular approach for addressing multimodal data fusion [1, 6, 7, 12]. While these models can effectively combine multimodal features by learning from data, they nevertheless lack an explicit exhibition of how different modalities are related to each other, due to the inherent low interpretability of neural networks [2]. In the meantime, Quantum Theory (QT) has given rise to principled approaches for incorporating interactions between textual features into a holistic textual representation [3, 5, 8, 10], where the concepts of superposition andentanglement have been universally exploited to formulate interactions. The advantages of those models in capturing complicated correlations between textual features have been observed. We hereby propose the research on quantum-inspired multimodal data fusion, claiming that the limitation of multimodal data fusion can be tackled by quantum-driven models. In particular, we propose to employ superposition to formulate intra-modal interactions while the interplay between different modalities is expected to be captured by entanglement measures. By doing so, the interactions within multimodal data may be rendered explicitly in a unified quantum formalism, increasing the performance and interpretability for concrete multimodal tasks. It will also expand the application domains of quantum theory to multimodal tasks where only preliminary efforts have been made [11]. We therefore aim at answering the following research question: RQ. Can we fuse multimodal data with quantum-inspired models? To answer this question, we propose to fuse multimodal data with complex-valued neural networks, motivated by the theoretical link between neural networks and quantum theory [4] and advances in complex-valued neural networks [9]. Our model begins with a separate complex-valued embedding learned for each unimodal data based on the existing works [5, 10] which inherently assumes superposition between intra-modal features. Then we construct a many-body system in entangled state for multimodal data, where cross-modality interactions are naturally reflected by entanglement measures. Quantum measurement operators are applied to the entanglement state to address a concrete multimodal task at hand. The whole process is instrumented by a complex-valued neural network, which is able to learn how multimodal features are combined from data, and at the same time explain the combination by means of quantum superposition and entanglement measures. We plan to examine our proposed models on CMU-MOSI [12] and CMU-MOSEI [1] which are benchmarking multimodal sentiment analysis datasets. The dataset targets at classifying sentiment into 2, 5 or 7 classes with the input of textual, visual and acoustic features. We expect to see comparable effectiveness to state-of-the-art models, and we will explore superposition and entanglement measures to better understand the inter-modal interactions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multimodal Data Fusion with Quantum Inspiration

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A survey of multimodal information fusion for smart healthcare: Mapping the journey from data to wisdom
Thanveer Shaik ... Juan D Velásquez
Information Fusion | VOL. 102
Thanveer Shaik, et. al.Thanveer Shaik ... Juan D Velásquez
28 Sep 2023
Information Fusion | VOL. 102

Multimodal Fusion of Brain Imaging Data: A Key to Finding the Missing Link(s) in Complex Mental Illness
Vince D Calhoun ... Jing Sui
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging | VOL. 1
Vince D Calhoun, et. al.Vince D Calhoun ... Jing Sui
07 Jan 2016
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging | VOL. 1

Research progress on electronic health records multimodal data fusion based on deep learning
Yong Fan ... Jing Wang
Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi | VOL. 41
Yong Fan, et. al.Yong Fan ... Jing Wang
25 Oct 2024
Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi | VOL. 41

Utilisation of Deep Learning with Multimodal Data Fusion for Determination of Pineapple Quality Using Thermal Imaging
Maimunah Mohd Ali ... Ola Lasekan
Agronomy | VOL. 13
Maimunah Mohd Ali, et. al.Maimunah Mohd Ali ... Ola Lasekan
30 Jan 2023
Agronomy | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal Data Fusion with Quantum Inspiration

Abstract

Talk to us

Similar Papers