Action knowledge for video captioning with graph neural networks

Willy Fitra Hendria,Vania Velda,Bahy Helmi Hartoyo Putra,Fikriansyah Adzaka,Cheol Jeong

doi:10.1016/j.jksuci.2023.03.006

Willy Fitra Hendria, Vania Velda + Show 3 more

Open Access

https://doi.org/10.1016/j.jksuci.2023.03.006

Copy DOI

Abstract

Many existing video captioning methods capture action information in the video by exploiting features extracted from an action recognition model. However, directly using the action features without object-specific representation may not well capture the object interactions. Consequently, the generated captions may not be accurate enough in describing the action and the object in the scenes. To address this issue, we propose to incorporate the action features as the edge features in a graph neural network where the nodes represent objects, thereby capturing a finer visual representation of object-action-object relationships. Previous graph-based video captioning methods commonly relied on a pretrained object detection model to create the node representations. The object detection model, however, may miss detecting some important objects. To alleviate this problem, we further introduce a grid-based node representation where the nodes are represented by the features extracted from grids of video frames. Using this representation, the important objects in the scenes are captured more thoroughly. To avoid adding any complexity during inference, the knowledge of the proposed graph is transferred to another neural network via knowledge distillation. Our proposed method achieved state-of-the-art results on two popular video captioning datasets, i.e., MSVD and MSR-VTT, on all metrics. The code of our proposed method is available at https://github.com/Sejong-VLI/V2T-Action-Graph-JKSUCIS-2023.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Action knowledge for video captioning with graph neural networks

Abstract

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences

Lead the way for us

Journal: Journal of King Saud University - Computer and Information Sciences	Publication Date: Mar 16, 2023
License type: cc-by-nc-nd

Similar Papers

Exploiting Edge Features for Graph Neural Networks
Liyu Gong ... Qiang Cheng
-
Liyu Gong, et. al.Liyu Gong ... Qiang Cheng
01 Jun 2019
01 Jun 2019

Learning node representations against perturbations
Xu Chen ... Ya Zhang
Pattern Recognition | VOL. 145
Xu Chen, et. al.Xu Chen ... Ya Zhang
18 Sep 2023
Pattern Recognition | VOL. 145

RELIANT: Fair Knowledge Distillation for Graph Neural Networks
Yushun Dong ... Yiling Yuan
-
Yushun Dong, et. al.Yushun Dong ... Yiling Yuan
01 Jan 2023
01 Jan 2023

Graph-Free Knowledge Distillation for Graph Neural Networks
Xiang Deng ... Zhongfei Zhang
-
Xiang Deng, et. al.Xiang Deng ... Zhongfei Zhang
01 Aug 2021
01 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Action knowledge for video captioning with graph neural networks

Abstract

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences