Geo-Aware Image Caption Generation

Sofia Nikiforova,Denis Paperno,Tejaswini Deoskar,Yoad Winter

doi:10.18653/v1/2020.coling-main.280

Abstract

Standard image caption generation systems produce generic descriptions of images and do not utilize any contextual information or world knowledge. In particular, they are unable to generate captions that contain references to the geographic context of an image, for example, the location where a photograph is taken or relevant geographic objects around an image location. In this paper, we develop a geo-aware image caption generation system, which incorporates geographic contextual information into a standard image captioning pipeline. We propose a way to build an image-specific representation of the geographic context and adapt the caption generation network to produce appropriate geographic names in the image descriptions. We evaluate our system on a novel captioning dataset that contains contextualized captions and geographic metadata and achieve substantial improvements in BLEU, ROUGE, METEOR and CIDEr scores. We also introduce a new metric to assess generated geographic references directly and empirically demonstrate our system’s ability to produce captions with relevant and factually accurate geographic referencing.

Highlights

Image caption generation is a popular task that aims at producing a natural language description of a given image
A standard neural image captioning system consists of two stages: an “encoder”, a Convolutional Neural Network that encodes the visual features of an image as a vector, and a “decoder”, a language model that is initialized with this vector and generates a caption word by word
In this paper we present geo-aware image captioning, where geographic contextual information is incorporated into the generated captions

Summary

Introduction

Image caption generation is a popular task that aims at producing a natural language description of a given image. A standard neural image captioning system consists of two stages: an “encoder”, a Convolutional Neural Network that encodes the visual features of an image as a vector, and a “decoder”, a language model that is initialized with this vector and generates a caption word by word. People tend to describe images interpreting them based on context factors and world knowledge, while standard encoder-decoder captioning systems do not take any contextual or world knowledge into account. One of the aspects that are missing from standard caption generation systems is the ability to produce image descriptions influenced by the geographic context, i.e. geographic objects surrounding the image location. Consider the photograph in Figure 1: Automatically generated: a park bench sitting in the middle of a park

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Geo-Aware Image Caption Generation

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Geo-Aware Image Caption Generation
...
-
, et. al. ...
25 Nov 2020
25 Nov 2020

Thai Scene Graph Generation from Images and Applications
Panida Khuphiran ... Supasit Kajkamhaeng
-
Panida Khuphiran, et. al.Panida Khuphiran ... Supasit Kajkamhaeng
01 Oct 2019
01 Oct 2019

A Unified Visual and Linguistic Semantics Method for Enhanced Image Captioning
Jiajia Peng ... Tianbing Tang
Applied Sciences | VOL. 14
Jiajia Peng, et. al.Jiajia Peng ... Tianbing Tang
21 Mar 2024
Applied Sciences | VOL. 14

Image Captioning with Face Recognition using Transformers
Mohd Wasiuddin Junaid
International Journal for Research in Applied Science and Engineering Technology | VOL. 10
Mohd Wasiuddin JunaidMohd Wasiuddin Junaid
31 Jan 2022
International Journal for Research in Applied Science and Engineering Technology | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Geo-Aware Image Caption Generation

Abstract

Highlights

Summary

Talk to us

Similar Papers