Watch It Twice

Xiangxi Shi,Shafiq Joty,Jianfei Cai,Jiuxiang Gu

doi:10.1145/3343031.3351060

Abstract

With the rapid growth of video data and the increasing demands of various crossmodal applications such as intelligent video search and assistance towards visually-impaired people, video captioning task has received a lot of attention recently in computer vision and natural language processing fields. The state-of-the-art video captioning methods focus more on encoding the temporal information, while lacking effective ways to remove irrelevant temporal information and also neglecting the spatial details. In particular, the current unidirectional video encoder can be negatively affected by irrelevant temporal information, especially the irrelevant information at the beginning and at the end of a video. In addition, disregarding detailed spatial features may lead to incorrect word choices in decoding. In this paper, we propose a novel recurrent video encoding method and a novel visual spatial feature for the video captioning task. The recurrent encoding module encodes the video twice with a predicted key frame to avoid irrelevant temporal information often occurring at the beginning and at the end of a video. The novel spatial features represent spatial information from different regions of a video and provide the decoder with more detailed information. Experiments on two benchmark datasets show superior performance of the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Watch It Twice

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Review of Methods for Video Captioning
Abhinav Kumar ... Rejo Mathew
SSRN Electronic Journal | VOL. -
Abhinav Kumar, et. al.Abhinav Kumar ... Rejo Mathew
05 Sep 2020
SSRN Electronic Journal | VOL. -

Confiner Based Video Captioning Model
J Vaishnavi ... V Narmatha
IOP Conference Series: Materials Science and Engineering | VOL. 1258
J Vaishnavi, et. al.J Vaishnavi ... V Narmatha
01 Oct 2022
IOP Conference Series: Materials Science and Engineering | VOL. 1258

A study of evaluation metrics and datasets for video captioning
Jaehui Park ... Ji-Hyeong Han
-
Jaehui Park, et. al.Jaehui Park ... Ji-Hyeong Han
01 Nov 2017
01 Nov 2017

Video Captioning Based on Multi-layer Gated Recurrent Unit for Smartphones
Bengü Feti̇ler ... Özge Taylan Moral
European Journal of Science and Technology | VOL. -
Bengü Feti̇ler, et. al.Bengü Feti̇ler ... Özge Taylan Moral
02 Jan 2022
European Journal of Science and Technology | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Watch It Twice

Abstract

Talk to us

Similar Papers