A study of evaluation metrics and datasets for video captioning

Jaehui Park,Chibon Song,Ji-Hyeong Han

doi:10.1109/iciibms.2017.8279760

Abstract

With the fast growing interest in deep learning, various applications and machine learning tasks are emerged in recent years. Video captioning is especially gaining a lot of attention from both computer vision and natural language processing fields. Generating captions is usually performed by jointly learning of different types of data modalities that share common themes in the video. Learning with the joining representations of different modalities is very challenging due to the inherent heterogeneity resided in the mixed information of visual scenes, speech dialogs, music and sounds, and etc. Consequently, it is hard to evaluate the quality of video captioning results. In this paper, we introduce well-known metrics and datasets for evaluation of video captioning. We compare the the existing metrics and datasets to derive a new research proposal for the evaluation of video descriptions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A study of evaluation metrics and datasets for video captioning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Exploring deep learning approaches for video captioning: A comprehensive review
Adel Jalal Yousif ... Mohammed H Al-Jammas
e-Prime - Advances in Electrical Engineering, Electronics and Energy | VOL. 6
Adel Jalal Yousif, et. al.Adel Jalal Yousif ... Mohammed H Al-Jammas
22 Nov 2023
e-Prime - Advances in Electrical Engineering, Electronics and Energy | VOL. 6

A Review of Methods for Video Captioning
Abhinav Kumar ... Rejo Mathew
SSRN Electronic Journal | VOL. -
Abhinav Kumar, et. al.Abhinav Kumar ... Rejo Mathew
05 Sep 2020
SSRN Electronic Journal | VOL. -

Video Description
Nayyer Aafaq ... Wei Liu
ACM Computing Surveys | VOL. 52
Nayyer Aafaq, et. al.Nayyer Aafaq ... Wei Liu
16 Oct 2019
ACM Computing Surveys | VOL. 52

NLP Meets Vision for Visual Interpretation - A Retrospective Insight and Future directions
Ahmed Jamshed ... Muhammad Moazam Fraz
-
Ahmed Jamshed, et. al.Ahmed Jamshed ... Muhammad Moazam Fraz
20 May 2021
20 May 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A study of evaluation metrics and datasets for video captioning

Abstract

Talk to us

Similar Papers