Fine-Grained Image Captioning With Global-Local Discriminative Objective

Jie Wu,Hefeng Wu,Tianshui Chen,Liang Lin,Guangchun Luo,Zhi Yang

doi:10.1109/tmm.2020.3011317

Abstract

Significant progress has been made in recent years in image captioning, an active topic in the fields of vision and language. However, existing methods tend to yield overly general captions and consist of some of the most frequent words/phrases, resulting in inaccurate and indistinguishable descriptions (see Fig. 1). This is primarily due to (i) the conservative characteristic of traditional training objectives that drives the model to generate correct but hardly discriminative captions for similar images and (ii) the uneven word distribution of the ground-truth captions, which encourages generating highly frequent words/phrases while suppressing the less frequent but more concrete ones. In this work, we propose a novel global-local discriminative objective that is formulated on top of a reference model to facilitate generating fine-grained descriptive captions. Specifically, from a global perspective, we design a novel global discriminative constraint that pulls the generated sentence to better discern the corresponding image from all others in the entire dataset. From the local perspective, a local discriminative constraint is proposed to increase attention such that it emphasizes the less frequent but more concrete words/phrases, thus facilitating the generation of captions that better describe the visual details of the given images. We evaluate the proposed method on the widely used MS-COCO dataset, where it outperforms the baseline methods by a sizable margin and achieves competitive performance over existing leading approaches. We also conduct self-retrieval experiments to demonstrate the discriminability of the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fine-Grained Image Captioning With Global-Local Discriminative Objective

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Jan 1, 2021
Citations: 54

Similar Papers

Concrete Image Captioning by Integrating Content Sensitive and Global Discriminative Objective
Jie Wu ... Hefeng Wu
-
Jie Wu, et. al.Jie Wu ... Hefeng Wu
01 Jul 2019
01 Jul 2019

Improving Image Captioning with Better Use of Caption
Zhan Shi ... Xu Zhou
-
Zhan Shi, et. al.Zhan Shi ... Xu Zhou
01 Jan 2020
01 Jan 2020

Multi-GRU Based Automated Image Captioning for Smartphones
Rumeysa Keskin ... Volkan Kılıç
-
Rumeysa Keskin, et. al.Rumeysa Keskin ... Volkan Kılıç
09 Jun 2021
09 Jun 2021

Generating Human-Like Descriptions for the Given Image Using Deep Learning
Tanvi S Laddha ... N Patel
ITM Web of Conferences | VOL. 53
Tanvi S Laddha, et. al.Tanvi S Laddha ... N Patel
01 Jan 2023
ITM Web of Conferences | VOL. 53

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fine-Grained Image Captioning With Global-Local Discriminative Objective

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia