A Comprehensive Analysis of Real-World Image Captioning and Scene Identification

doi:10.33140/jeee.02.03.14

Abstract

Image captioning is a computer vision task that involves generating natural language descriptions for images. This method has numerous applications in various domains, including image retrieval systems, medicine, and various industries. However, while there has been significant research in image captioning, most studies have focused on high quality images or controlled environments, without exploring the challenges of real-world image captioning. Real-world image captioning involves complex and dynamic environments with numerous points of attention, with images which are often very poor in quality, making it a challenging task, even for humans. This paper evaluates the performance of various models that are built on top of different encoding mechanisms, language decoders and training procedures using a newly created real-world dataset that consists of over 800+ images of over 65 different scene classes, built using MIT Indoor scenes dataset. This dataset is captioned using the IC3 approach that generates more descriptive captions by summarizing the details that are covered by standard image captioning models from unique view-points of the image.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Comprehensive Analysis of Real-World Image Captioning and Scene Identification

Abstract

Talk to us

Similar Papers

More From: Journal of Electrical Electronics Engineering

Lead the way for us

Similar Papers

Chinese Image Caption Generation via Visual Attention and Topic Modeling.
Maofu Liu ... Lingjun Li
IEEE Transactions on Cybernetics | VOL. 52
Maofu Liu, et. al.Maofu Liu ... Lingjun Li
22 Jun 2020
IEEE Transactions on Cybernetics | VOL. 52

SCT: Summary Caption Technique for Retrieving Relevant Images in Alignment with Multimodal Abstractive Summary
Shaik Rafi ... Ranjita Das
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23
Shaik Rafi, et. al.Shaik Rafi ... Ranjita Das
09 Mar 2024
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23

Auto-Encoding and Distilling Scene Graphs for Image Captioning.
Xu Yang ... Hanwang Zhang
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44
Xu Yang, et. al.Xu Yang ... Hanwang Zhang
01 Jan 2020
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44

RefCap: image captioning with referent objects attributes
Seokmok Park ... Joonki Paik
Scientific Reports | VOL. 13
Seokmok Park, et. al.Seokmok Park ... Joonki Paik
07 Dec 2023
Scientific Reports | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Comprehensive Analysis of Real-World Image Captioning and Scene Identification

Abstract

Talk to us

Similar Papers

More From: Journal of Electrical Electronics Engineering