A Multimodal Aggregation Network With Serial Self-Attention Mechanism for Micro-Video Multi-Label Classification

Wei Lu,Jiaxin Lin,Yuting Su,Peiguang Jing

doi:10.1109/lsp.2023.3240889

Abstract

Currently, micro-videos have attracted increasing attention due to their unique properties and great commercial value. Considering that micro-videos naturally incorporate multimodal information, a powerful representation method for distinct joint multimodal representations is essential for real applications. Inspired by the potential of attention neural network architectures over various tasks, we propose a multimodal aggregation network (MANET) with a serial self-attention mechanism to perform tasks of micro-video multi-label classification. Specifically, we first propose a parallel content-dependent graph neural networks (CDGNN) module, which explores category-related embeddings of micro-videos by disentangling category relations into modality-specific and modality-shared category dependency patterns. Then we introduce a serial self-attention (SSA) module to transmit the multimodal information in sequential order, in which an aggregation bottleneck is incorporated to better collect and condense the significant information. Experiments conducted on a large-scale multi-label micro-video dataset demonstrate that our proposed method has achieved competitive results compared with several state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Multimodal Aggregation Network With Serial Self-Attention Mechanism for Micro-Video Multi-Label Classification

Abstract

Talk to us

Similar Papers

More From: IEEE Signal Processing Letters

Lead the way for us

Journal: IEEE Signal Processing Letters	Publication Date: Jan 1, 2023
Citations: 4

Similar Papers

MSFR-Net: Multi-modality and single-modality feature recalibration network for brain tumor segmentation.
Xiang Li ... Shen Yin
Medical Physics | VOL. 50
Xiang Li, et. al.Xiang Li ... Shen Yin
23 Aug 2022
Medical Physics | VOL. 50

Attention-based 3D convolutional recurrent neural network model for multimodal emotion recognition.
Yiming Du ... Longlong Cheng
Frontiers in neuroscience | VOL. 17
Yiming Du, et. al.Yiming Du ... Longlong Cheng
10 Jan 2024
Frontiers in neuroscience | VOL. 17

Sentiment Classification Algorithm Based on Multi-Modal Social Media Text Information
Minzheng Xuanyuan ... Le Xiao
IEEE Access | VOL. 9
Minzheng Xuanyuan, et. al.Minzheng Xuanyuan ... Le Xiao
01 Jan 2020
IEEE Access | VOL. 9

Symmetry-Informed Reinforcement Learning and its Application to Low-Level Attitude Control of Quadrotors
Junchang Huang ... Gang Hu
IEEE Transactions on Artificial Intelligence | VOL. 5
Junchang Huang, et. al.Junchang Huang ... Gang Hu
01 Mar 2024
IEEE Transactions on Artificial Intelligence | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multimodal Aggregation Network With Serial Self-Attention Mechanism for Micro-Video Multi-Label Classification

Abstract

Talk to us

Similar Papers

More From: IEEE Signal Processing Letters