A latent topic‐aware network for dense video captioning

Tao Xu,Xinyu He,Caihua Liu,Yuanyuan Cui

doi:10.1049/cvi2.12195

Abstract

AbstractMultiple events in a long untrimmed video possess the characteristics of similarity and continuity. These characteristics can be considered as a kind of topic semantic information, which probably behaves as same sports, similar scenes, same objects etc. Inspired by this, a novel latent topic‐aware network (LTNet) is proposed in this article. The LTNet explores potential themes within videos and generates more continuous captions. Firstly, a global visual topic finder is employed to detect the similarity among events and obtain latent topic‐level features. Secondly, a latent topic‐oriented relation learner is designed to further enhance the topic‐level representations by capturing the relationship between each event and the video themes. Benefiting from the finder and the learner, the caption generator is capable of predicting more accurate and coherent descriptions. The effectiveness of our proposed method is demonstrated on ActivityNet Captions and YouCook2 datasets, where LTNet shows a relative performance of over 3.03% and 0.50% in CIDEr score respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A latent topic‐aware network for dense video captioning

Abstract

Talk to us

Similar Papers

More From: IET Computer Vision

Lead the way for us

Journal: IET Computer Vision	Publication Date: Aug 29, 2023
License type: CC BY-NC 4.0

Similar Papers

Event-Centric Hierarchical Representation for Dense Video Captioning
Teng Wang ... Haifeng Hu
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 31
Teng Wang, et. al.Teng Wang ... Haifeng Hu
13 Aug 2020
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 31

Context-aware network with foreground recalibration for grounding natural language in video
Cheng Chen ... Xiaodong Gu
Neural Computing and Applications | VOL. 33
Cheng Chen, et. al.Cheng Chen ... Xiaodong Gu
26 Feb 2021
Neural Computing and Applications | VOL. 33

MPP-net: Multi-perspective perception network for dense video captioning
Yiwei Wei ... Zhiling Yan
Neurocomputing | VOL. 552
Yiwei Wei, et. al.Yiwei Wei ... Zhiling Yan
07 Jul 2023
Neurocomputing | VOL. 552

Thai Scene Graph Generation from Images and Applications
Panida Khuphiran ... Supasit Kajkamhaeng
-
Panida Khuphiran, et. al.Panida Khuphiran ... Supasit Kajkamhaeng
01 Oct 2019
01 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A latent topic‐aware network for dense video captioning

Abstract

Talk to us

Similar Papers

More From: IET Computer Vision