Abstract

Methodologies that utilize Deep Learning offer great potential for applications that automatically attempt to generate captions or descriptions about images and video frames. Image and video captioning are considered to be intellectually challenging problems in imaging science. The application domains include automatic caption (or description) generation for images and videos for people who suffer from various degrees of visual impairment; the automatic creation of metadata for images and videos (indexing) for use by search engines; general-purpose robot vision systems; and many others. Each of these application domains can positively and significantly impact many other task-specific applications. This article is not meant to be a comprehensive review of image captioning; rather, it is a concise review of both image captioning and video captioning methodologies based on deep learning. This study treats both image and video captioning by emphasizing the algorithmic overlap between the two.

Highlights

  • I MAGE processing has played and will continue to play an important role in science and industry

  • The science and methodology behind deep learning have been in existence for decades, but an increasing abundance of digital data and the involvement of powerful GPUs have accelerated the development of deep learning research in recent years

  • Several well-known models [13] in the field of CNNs based on object detection [1], [31], [32] and segmentation [33] exist that are heavily used in image captioning and video captioning architecture to extract the visual information

Read more

Summary

INTRODUCTION

I MAGE processing has played and will continue to play an important role in science and industry. Image captioning and video captioning need more effort than image recognition, because of the additional challenge of recognizing the objects and actions in the image and creating a succinct meaningful sentence based on the contents found The advancement of this process opens up enormous opportunities in many application domains in real life, such as aid to people who suffer from various degrees of visual impairment, self-driving vehicles, sign language translation, human-robot interaction, automatic video subtitling, video surveillance, and more. The utilization of image captioning methods as building blocks to construct a video captioning system - i.e., Treating image captioning as a repetitive subset of video captioning; Review of hardware requirements and software frameworks for implementing an image/video captioning architecture; A novel application (case study) of video captioning, namely, the automatic generation of "titles" for video clips

IMAGE AND VIDEO CAPTIONING
IMAGE CAPTIONING METHODOLOGIES
VIDEO CAPTIONING METHODOLOGIES
VIDEO CAPTIONING DATASETS
IMAGE AND VIDEO CAPTIONING EVALUATION METRICS
THE REQUIRED PLATFORM FOR IMPLEMENTATION:
SOFTWARE REQUIREMENT
HARDWARE REQUIREMENT
CASE STUDY
Findings
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call