Leveraging Weighted Cross-Graph Attention for Visual and Semantic Enhanced Video Captioning Network

Deepali Verma,Arya Haldar,Tanima Dutta

doi:10.1609/aaai.v37i2.25343

Abstract

Video captioning has become a broad and interesting research area. Attention-based encoder-decoder methods are extensively used for caption generation. However, these methods mostly utilize the visual attentive feature to highlight the video regions while overlooked the semantic features of the available captions. These semantic features contain significant information that helps to generate highly informative human description-like captions. Therefore, we propose a novel visual and semantic enhanced video captioning network, named as VSVCap, that efficiently utilizes multiple ground truth captions. We aim to generate captions that are visually and semantically enhanced by exploiting both video and text modalities. To achieve this, we propose a fine-grained cross-graph attention mechanism that captures detailed graph embedding correspondence between visual graphs and textual knowledge graphs. We have performed node-level matching and structure-level reasoning between the weighted regional graph and knowledge graph. The proposed network achieves promising results on three benchmark datasets, i.e., YouTube2Text, MSR-VTT, and VATEX. The experimental results show that our network accurately captures all key objects, relationships, and semantically enhanced events of a video to generate human annotation-like captions.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Leveraging Weighted Cross-Graph Attention for Visual and Semantic Enhanced Video Captioning Network

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Jun 26, 2023
Citations: 2

Similar Papers

KSF-ST: Video Captioning Based on Key Semantic Frames Extraction and Spatio-Temporal Attention Mechanism
Zhaowei Qu ... Luhan Zhang
-
Zhaowei Qu, et. al.Zhaowei Qu ... Luhan Zhang
01 Jun 2020
01 Jun 2020

An attention based dual learning approach for video captioning
Wanting Ji ... Xun Wang
Applied Soft Computing Journal | VOL. 117
Wanting Ji, et. al.Wanting Ji ... Xun Wang
21 Dec 2021
Applied Soft Computing Journal | VOL. 117

Knowledge mining and graph visualization of ancient Chinese scientific and technological documents bibliographic summaries based on digital humanities
Xiang Zheng ... Mingjie Li
Library Hi Tech | VOL. -
Xiang Zheng, et. al.Xiang Zheng ... Mingjie Li
29 May 2023
Library Hi Tech | VOL. -

Automatic Question Generation based on MOOC Video Subtitles and Knowledge Graph
Lin Ma ... Yuchun Ma
-
Lin Ma, et. al.Lin Ma ... Yuchun Ma
29 Mar 2019
29 Mar 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Leveraging Weighted Cross-Graph Attention for Visual and Semantic Enhanced Video Captioning Network

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence