Attention-guided multi-granularity fusion model for video summarization

Yunzuo Zhang,Yameng Liu,Cunyu Wu

doi:10.1016/j.eswa.2024.123568

Abstract

Video summarization has attracted extensive attention benefiting from its valuable capability to facilitate video browsing. While achieving notable improvement, existing methods still fail to sufficiently and effectively model contextual information within videos, hindering the summarization performance owing to a deficiency in powerful contextual representations. To address this limitation, we present a novel Attention-Guided Multi-Granularity Fusion Model (AMFM), which allows for optimizing the modeling process from the context capturing and fusion perspective. AMFM comprises three dominant components including a content-aware enhancement (CAE) module, a multi-granularity encoder (MGE), and a scale-adaptive fusion (SAF) module. More specifically, CAE dynamically enhances pre-trained visual features by learning the potential visual relationship across frame-level and video-level embeddings. Subsequently, coarse-grained and fine-grained contextual information is simultaneously modeled in the same representation space by MGE with the combination of self-attention and temporal convolution scheme. Furthermore, the multi-granularity representations with a significant difference in the semantic scale are adaptively fused by SAF. Our method can precisely pinpoint key segments by effectively modeling and processing rich temporal representations. Extensive comparisons with state-of-the-art methods on standard datasets demonstrate the effectiveness of the proposed method, and the ablation studies further verify the positive impact of each module in our model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Attention-guided multi-granularity fusion model for video summarization

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Feb 27, 2024
Citations: 6

Similar Papers

Attention Over Attention: An Enhanced Supervised Video Summarization Approach
Isha Puthige ... Mohit Agarwal
Procedia Computer Science | VOL. 218
Isha Puthige, et. al.Isha Puthige ... Mohit Agarwal
01 Jan 2023
Procedia Computer Science | VOL. 218

Video Summarization using Submodular Convex Optimization with Dynamic Support Vector Machine for Forest Fire Sequence Classification
B Pushpa ... M Kamarasan
-
B Pushpa, et. al.B Pushpa ... M Kamarasan
01 Nov 2019
01 Nov 2019

Spatial adaptive and transformer fusion network (STFNet) for low-count PET blind denoising with MRI.
Lipei Zhang ... Dong Liang
Medical Physics | VOL. 49
Lipei Zhang, et. al.Lipei Zhang ... Dong Liang
10 Dec 2021
Medical Physics | VOL. 49

DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation
Ailiang Lin ... Guangming Lu
IEEE Transactions on Instrumentation and Measurement | VOL. 71
Ailiang Lin, et. al.Ailiang Lin ... Guangming Lu
01 Jan 2021
IEEE Transactions on Instrumentation and Measurement | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Attention-guided multi-granularity fusion model for video summarization

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications