Abstract

Automatically describing the content of an image is an interesting and challenging task in artificial intelligence. In this paper, an enhanced image captioning model—including object detection, color analysis, and image captioning—is proposed to automatically generate the textual descriptions of images. In an encoder–decoder model for image captioning, VGG16 is used as an encoder and an LSTM (long short-term memory) network with attention is used as a decoder. In addition, Mask R-CNN with OpenCV is used for object detection and color analysis. The integration of the image caption and color recognition is then performed to provide better descriptive details of images. Moreover, the generated textual sentence is converted into speech. The validation results illustrate that the proposed method can provide more accurate description of images.

Highlights

  • Image captioning essentially comprises two tasks: computer vision, and natural language processing (NLP)

  • Computer vision helps to recognize and understand the scenario presented in an image, and NLP converts this semantic knowledge into a descriptive sentence

  • Image captioning can be used in social media to automatically generate the caption for a posted image or to describe a video in real time

Read more

Summary

Introduction

Image captioning essentially comprises two tasks: computer vision, and natural language processing (NLP). Image captioning has many applications—for instance, as an aid developed to guide visually challenged people in travelling independently. This can be done by first converting the scenario into text and transferring the text to voice messages. Automatic captioning could improve the Google image search technique by converting the image into a caption and using the keywords for further related searches. It can be used in surveillance, by generating the relevant captions from CCTV cameras and raising alarms if any suspicious activity is detected [1]

Related Works
Methods
Implementation
Preliminary Identification
Image Captioning and Object Recognition
Conclusions and Future Work
Findings
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.