MAVEN: A Memory Augmented Recurrent Approach for Multimodal Fusion

Md Mofijul Islam,Tariq Iqbal,Mohammad Samin Yasar

doi:10.1109/tmm.2022.3164261

Abstract

Multisensory systems provide complementary information that aids many machine learning approaches in perceiving the environment comprehensively. These systems consist of heterogeneous modalities, which have disparate characteristics and feature distributions. Thus, extracting, aligning, and fusing complementary representations from heterogeneous modalities (e.g., visual, skeleton, and physical sensors) remains challenging. To address these challenges, we have used the insights from several neuroscience studies of animal multisensory systems to develop MAVEN, a memory-augmented recurrent approach for multimodal fusion. MAVEN generates unimodal memory banks comprised of spatial-temporal features and uses our proposed recurrent representation alignment approach to align and refine unimodal representations iteratively. MAVEN then utilizes a multimodal variational attention-based fusion approach to produce a robust multimodal representation from the aligned unimodal features. Our extensive experimental evaluations on three multimodal datasets suggest that MAVEN outperforms state-of-the-art multimodal learning approaches in the challenging human activity recognition task across all evaluation conditions (cross-subject, leave-one-subject-out, and cross-session). Additionally, our extensive ablation studies suggest that MAVEN significantly outperforms the feed-forward fusion-based learning models (p<0.05). Finally, the robust performance of MAVEN in extracting complementary multimodal representation from occluded and noisy data suggests its applicability on real-world datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MAVEN: A Memory Augmented Recurrent Approach for Multimodal Fusion

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Jan 1, 2023
Citations: 5

Similar Papers

DTCWTASODCNN: DTCWT based Weighted Fusion Model for Multimodal Medical Image Quality Improvement with ASO Technique & DCNN
-
Journal of Scientific and Industrial Research (JSIR) | VOL. -
--
01 Aug 2022
Journal of Scientific and Industrial Research (JSIR) | VOL. -

A Multimodal Predictive Model of Successful Debaters or How I Learned to Sway Votes
Maarten Brilman ... Stefan Scherer
-
Maarten Brilman, et. al.Maarten Brilman ... Stefan Scherer
13 Oct 2015
13 Oct 2015

Towards Robust Semantic Segmentation of Land Covers in Foggy Conditions
Weipeng Shi ... Allshine Chen
Remote Sensing | VOL. 14
Weipeng Shi, et. al.Weipeng Shi ... Allshine Chen
12 Sep 2022
Remote Sensing | VOL. 14

Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance
Dong Zhang ... Qiaoming Zhu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Dong Zhang, et. al.Dong Zhang ... Qiaoming Zhu
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MAVEN: A Memory Augmented Recurrent Approach for Multimodal Fusion

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia