IMF-MF: Interactive moment localization with adaptive multimodal fusion and self-attention

Pratibha Singh,Neeraj Varshney,Alok Kumar Singh Kushwaha

doi:10.3233/jifs-233071

Abstract

Precise video moment retrieval is crucial for enabling users to locate specific moments within a large video corpus. This paper presents Interactive Moment Localization with Multimodal Fusion (IMF-MF), a novel interactive moment localization with multimodal fusion model that leverages the power of self-attention to achieve state-of-the-art performance. IMF-MF effectively integrates query context and multimodal features, including visual and audio information, to accurately localize moments of interest. The model operates in two distinct phases: feature fusion and joint representation learning. The first phase dynamically calculates fusion weights for adapting the combination of multimodal video content, ensuring that the most relevant features are prioritized. The second phase employs bi-directional attention to tightly couple video and query features into a unified joint representation for moment localization. This joint representation captures long-range dependencies and complex patterns, enabling the model to effectively distinguish between relevant and irrelevant video segments. The effectiveness of IMF-MF is demonstrated through comprehensive evaluations on three benchmark datasets: TVR for closed-world TV episodes and Charades for open-world user-generated videos, DiDeMo dataset, Open-world, diverse video moment retrieval dataset. The empirical results indicate that the proposed approach surpasses existing state-of-the-art methods in terms of retrieval accuracy, as evaluated by metrics like Recall (R1, R5, R10, and R100) and Intersection-of-Union (IoU). The results consistently demonstrate IMF-MF’s superior performance compared to existing state-of-the-art methods, highlighting the benefits of its innovative interactive moment localization approach and the use of self-attention for feature representation and attention modeling.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

IMF-MF: Interactive moment localization with adaptive multimodal fusion and self-attention

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent & Fuzzy Systems

Lead the way for us

Similar Papers

Adaptive Multimodal Fusion With Attention Guided Deep Supervision Net for Grading Hepatocellular Carcinoma.
Shangxuan Li ... Guangyi Wang
IEEE Journal of Biomedical and Health Informatics | VOL. 26
Shangxuan Li, et. al.Shangxuan Li ... Guangyi Wang
01 Aug 2022
IEEE Journal of Biomedical and Health Informatics | VOL. 26

Development and validation of a multi-modality fusion deep learning model for differentiating glioblastoma from solitary brain metastases.
Chunquan Li ... Ziye Yan
Zhong nan da xue xue bao. Yi xue ban = Journal of Central South University. Medical sciences | VOL. 49
Chunquan Li, et. al.Chunquan Li ... Ziye Yan
28 Jan 2024
Zhong nan da xue xue bao. Yi xue ban = Journal of Central South University. Medical sciences | VOL. 49

Hybrid optimized multimodal spatiotemporal feature fusion for vision-based sports activity recognition
M Amsaprabhaa
Journal of Intelligent & Fuzzy Systems | VOL. 46
M AmsaprabhaaM Amsaprabhaa
10 Jan 2024
Journal of Intelligent & Fuzzy Systems | VOL. 46

Edge Detection Guide Network for Semantic Segmentation of Remote-Sensing Images
Jianhui Jin ... Wujie Zhou
IEEE Geoscience and Remote Sensing Letters | VOL. 20
Jianhui Jin, et. al.Jianhui Jin ... Wujie Zhou
01 Jan 2023
IEEE Geoscience and Remote Sensing Letters | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

IMF-MF: Interactive moment localization with adaptive multimodal fusion and self-attention

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent &amp; Fuzzy Systems

More From: Journal of Intelligent & Fuzzy Systems