Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap

Soheyla Amirian,Khaled Rasheed,Hamid R Arabnia,Thiab R Taha

doi:10.1109/access.2020.3042484

Abstract

Methodologies that utilize Deep Learning offer great potential for applications that automatically attempt to generate captions or descriptions about images and video frames. Image and video captioning are considered to be intellectually challenging problems in imaging science. The application domains include automatic caption (or description) generation for images and videos for people who suffer from various degrees of visual impairment; the automatic creation of metadata for images and videos (indexing) for use by search engines; general-purpose robot vision systems; and many others. Each of these application domains can positively and significantly impact many other task-specific applications. This article is not meant to be a comprehensive review of image captioning; rather, it is a concise review of both image captioning and video captioning methodologies based on deep learning. This study treats both image and video captioning by emphasizing the algorithmic overlap between the two.

Highlights

I MAGE processing has played and will continue to play an important role in science and industry
The science and methodology behind deep learning have been in existence for decades, but an increasing abundance of digital data and the involvement of powerful GPUs have accelerated the development of deep learning research in recent years
Several well-known models [13] in the field of CNNs based on object detection [1], [31], [32] and segmentation [33] exist that are heavily used in image captioning and video captioning architecture to extract the visual information

Summary

INTRODUCTION

I MAGE processing has played and will continue to play an important role in science and industry. Image captioning and video captioning need more effort than image recognition, because of the additional challenge of recognizing the objects and actions in the image and creating a succinct meaningful sentence based on the contents found The advancement of this process opens up enormous opportunities in many application domains in real life, such as aid to people who suffer from various degrees of visual impairment, self-driving vehicles, sign language translation, human-robot interaction, automatic video subtitling, video surveillance, and more. The utilization of image captioning methods as building blocks to construct a video captioning system - i.e., Treating image captioning as a repetitive subset of video captioning; Review of hardware requirements and software frameworks for implementing an image/video captioning architecture; A novel application (case study) of video captioning, namely, the automatic generation of "titles" for video clips

IMAGE AND VIDEO CAPTIONING

IMAGE CAPTIONING METHODOLOGIES

VIDEO CAPTIONING METHODOLOGIES

VIDEO CAPTIONING DATASETS

IMAGE AND VIDEO CAPTIONING EVALUATION METRICS

THE REQUIRED PLATFORM FOR IMPLEMENTATION:

SOFTWARE REQUIREMENT

HARDWARE REQUIREMENT

CASE STUDY

Findings

CONCLUSION AND FUTURE WORK

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 107	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Synthesis of Vision and Language: Multifaceted Image Captioning Application
Ishita Kohli ... Arpit Gupta
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07
Ishita Kohli, et. al.Ishita Kohli ... Arpit Gupta
23 Dec 2023
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07

Image Caption and Medical Report Generation Based on Deep Learning: a Review and Algorithm Analysis
Zizhou Wang ... Runyi Li
-
Zizhou Wang, et. al.Zizhou Wang ... Runyi Li
01 Sep 2021
01 Sep 2021

Comparative Evaluation of CNN Architectures for Image Caption Generation
Sulabh Katiyar ... Samir Kumar
International Journal of Advanced Computer Science and Applications | VOL. 11
Sulabh Katiyar, et. al.Sulabh Katiyar ... Samir Kumar
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 11

An accurate generation of image captions for blind people using extended convolutional atom neural network.
Rajendra Prasad Mahapatra ... Tejal Tiwary
Multimedia tools and applications | VOL. 82
Rajendra Prasad Mahapatra, et. al.Rajendra Prasad Mahapatra ... Tejal Tiwary
15 Jul 2022
Multimedia tools and applications | VOL. 82

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access