Video Description Model Based on Temporal-Spatial and Channel Multi-Attention Mechanisms

Jie Xu,Jinhong Guo,Qiuru Fu,Linke Li,Haoliang Wei

doi:10.3390/app10124312

Jie Xu, Jinhong Guo + Show 3 more

Open Access

https://doi.org/10.3390/app10124312

Copy DOI

Abstract

Video description plays an important role in the field of intelligent imaging technology. Attention perception mechanisms are extensively applied in video description models based on deep learning. Most existing models use a temporal-spatial attention mechanism to enhance the accuracy of models. Temporal attention mechanisms can obtain the global features of a video, whereas spatial attention mechanisms obtain local features. Nevertheless, because each channel of the convolutional neural network (CNN) feature maps has certain spatial semantic information, it is insufficient to merely divide the CNN features into regions and then apply a spatial attention mechanism. In this paper, we propose a temporal-spatial and channel attention mechanism that enables the model to take advantage of various video features and ensures the consistency of visual features between sentence descriptions to enhance the effect of the model. Meanwhile, in order to prove the effectiveness of the attention mechanism, this paper proposes a video visualization model based on the video description. Experimental results show that, our model has achieved good performance on the Microsoft Video Description (MSVD) dataset and a certain improvement on the Microsoft Research-Video to Text (MSR-VTT) dataset.

Highlights

Video description is widely used in advanced intelligent technology, including smart city, smart transportation and smart home [1,2,3,4,5]
The multi-attention video description model we proposed in this paper is shown in Figure 3 and contains temporal, spatial, and channel attention mechanisms represented by T, S, and C, respectively
We proposed a video description model based on temporal-spatial and channel attention

Summary

Introduction

Video description is widely used in advanced intelligent technology, including smart city, smart transportation and smart home [1,2,3,4,5]. The model weighted every region of each image by using the attention mechanism before each word-predicting process, making the feature used in each prediction different. Based on this idea, Yao [13] proposed a video description model based on a temporal attention mechanism. Yao [13] proposed a video description model based on a temporal attention mechanism Their model weighted the features of all video frames and summed them whenever making word prediction. Our multi-attention video description model introduces the channel attention mechanism on the foundation of a traditional temporal and spatial attention mechanism This model makes a stronger combination of visual features and sentence descriptions so that the accuracy of the model is increased. In this video visualization model, we made a visual analysis of our attention mechanism and proved the accuracy of the model intuitively

Attention Mechanism

Temporal Attention

Spatial Attention

Network Architecture

Attention Calculation

Channel Attention

Attention Visualization

Datasets and Evaluation Metrics

Experiment Setting

Analysis of Different Attention Combinations

Comparison with Methods in Other Papers

Among the models

Visual Analysis and Validation

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied sciences	Publication Date: Jun 23, 2020
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Video Description Model Based on Temporal-Spatial and Channel Multi-Attention Mechanisms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences

Lead the way for us

Similar Papers

Spatial-Temporal Feature-Based Sports Video Classification
Zengkai Wang
International Journal of Ambient Computing and Intelligence | VOL. 12
Zengkai WangZengkai Wang
01 Oct 2021
International Journal of Ambient Computing and Intelligence | VOL. 12

Summary of fine-grained image recognition based on attention mechanism
Yao Ma ... Dan Xu
-
Yao Ma, et. al.Yao Ma ... Dan Xu
16 Feb 2022
16 Feb 2022

LAM: Lightweight Attention Module
Qiwei Ji ... Hechang Chen
-
Qiwei Ji, et. al.Qiwei Ji ... Hechang Chen
01 Jan 2021
01 Jan 2021

A Novel Deep Learning Method for Underwater Target Recognition Based on Res-Dense Convolutional Neural Network with Attention Mechanism
Anqi Jin ... Xiangyang Zeng
Journal of marine science and engineering | VOL. 11
Anqi Jin, et. al.Anqi Jin ... Xiangyang Zeng
02 Jan 2023
Journal of marine science and engineering | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Video Description Model Based on Temporal-Spatial and Channel Multi-Attention Mechanisms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences