A Hierarchical Spatial-Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning.

Xiaoyu Teng,Huilan Jiang,Xiaolin Gui,Pan Xu,Yang Liu,Jian An,Jianglei Tong

doi:10.3390/s22218275

Xiaoyu Teng, Huilan Jiang + Show 5 more

Open Access

https://doi.org/10.3390/s22218275

Copy DOI

Abstract

Video summarization (VS) is a widely used technique for facilitating the effective reading, fast comprehension, and effective retrieval of video content. Certain properties of the new video data, such as a lack of prominent emphasis and a fuzzy theme development border, disturb the original thinking mode based on video feature information. Moreover, it introduces new challenges to the extraction of video depth and breadth features. In addition, the diversity of user requirements creates additional complications for more accurate keyframe screening issues. To overcome these challenges, this paper proposes a hierarchical spatial-temporal cross-attention scheme for video summarization based on comparative learning. Graph attention networks (GAT) and the multi-head convolutional attention cell are used to extract local and depth features, while the GAT-adjusted bidirection ConvLSTM (DB-ConvLSTM) is used to extract global and breadth features. Furthermore, a spatial-temporal cross-attention-based ConvLSTM is developed for merging hierarchical characteristics and achieving more accurate screening in similar keyframes clusters. Verification experiments and comparative analysis demonstrate that our method outperforms state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Oct 28, 2022
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Hierarchical Spatial-Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

FA-Net: A Fused Feature for Multi-Head Attention Recoding Network for Pear Leaf Nutritional Deficiency Diagnosis with Visual RGB-Image Depth and Shallow Features
Yi Song ... Yuan Rao
Sensors | VOL. 23
Yi Song, et. al.Yi Song ... Yuan Rao
05 May 2023
Sensors | VOL. 23

RGB-D Image Saliency Detection Based on Cross-Model Feature Fusion
Zheng Chen ... Mingchen Yin
Journal of Computer-Aided Design & Computer Graphics | VOL. 33
Zheng Chen, et. al.Zheng Chen ... Mingchen Yin
01 Nov 2021
Journal of Computer-Aided Design & Computer Graphics | VOL. 33

Deep Learning Assists Surveillance Experts: Toward Video Data Prioritization
Tanveer Hussain ... Samee Ullah Khan
IEEE Transactions on Industrial Informatics | VOL. 19
Tanveer Hussain, et. al.Tanveer Hussain ... Samee Ullah Khan
01 Jul 2023
IEEE Transactions on Industrial Informatics | VOL. 19

An Adaptive Domain Adaptation Method for Rolling Bearings’ Fault Diagnosis Fusing Deep Convolution and Self-Attention Networks
Xiao Yu ... Zhongting Liang
IEEE Transactions on Instrumentation and Measurement | VOL. 72
Xiao Yu, et. al.Xiao Yu ... Zhongting Liang
01 Jan 2023
IEEE Transactions on Instrumentation and Measurement | VOL. 72

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hierarchical Spatial-Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)