Multimodal Graph for Unaligned Multimodal Sequence Analysis via Graph Convolution and Graph Pooling

Sijie Mai,Songlong Xing,Haifeng Hu,Ying Zeng,Jiaxuan He

doi:10.1145/3542927

Abstract

Multimodal sequence analysis aims to draw inferences from visual, language, and acoustic sequences. A majority of existing works focus on the aligned fusion of three modalities to explore inter-modal interactions, which is impractical in real-world scenarios. To overcome this issue, we seek to focus on analyzing unaligned sequences, which is still relatively underexplored and also more challenging. We propose Multimodal Graph, whose novelty mainly lies in transforming the sequential learning problem into graph learning problem. The graph-based structure enables parallel computation in time dimension (as opposed to recurrent neural network) and can effectively learn longer intra- and inter-modal temporal dependency in unaligned sequences. First, we propose multiple ways to construct the adjacency matrix for sequence to perform sequence to graph transformation. To learn intra-modal dynamics, a graph convolution network is employed for each modality based on the defined adjacency matrix. To learn inter-modal dynamics, given that the unimodal sequences are unaligned, the commonly considered word-level fusion does not pertain. To this end, we innovatively devise graph pooling algorithms to automatically explore the associations between various time slices from different modalities and learn high-level graph representation hierarchically. Multimodal Graph outperforms state-of-the-art models on three datasets under the same experimental setting.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multimodal Graph for Unaligned Multimodal Sequence Analysis via Graph Convolution and Graph Pooling

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Feb 6, 2023
Citations: 4

Similar Papers

Cross-modality reinforcement for unaligned sequences sentiment analysis
Fan Wang ... Yongtao Wang
Journal of Intelligent & Fuzzy Systems | VOL. 43
Fan Wang, et. al.Fan Wang ... Yongtao Wang
22 Sep 2022
Journal of Intelligent & Fuzzy Systems | VOL. 43

Short-term Traffic Demand Prediction using Graph Convolutional Neural Networks
Aoyong Li ... Kay W Axhausen
AGILE: GIScience Series | VOL. 1
Aoyong Li, et. al.Aoyong Li ... Kay W Axhausen
15 Jul 2020
AGILE: GIScience Series | VOL. 1

Adaptive graph convolutional clustering network with optimal probabilistic graph
Jiayi Zhao ... Baocai Yin
Neural networks : the official journal of the International Neural Network Society | VOL. 156
Jiayi Zhao, et. al.Jiayi Zhao ... Baocai Yin
28 Sep 2022
Neural networks : the official journal of the International Neural Network Society | VOL. 156

Generation of Personalized Knowledge Graphs Based on GCN
Yuan Sun ... Jiaya Liang
Journal of Computer and Communications | VOL. 09
Yuan Sun, et. al.Yuan Sun ... Jiaya Liang
01 Jan 2020
Journal of Computer and Communications | VOL. 09

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal Graph for Unaligned Multimodal Sequence Analysis via Graph Convolution and Graph Pooling

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications