TreeTalk: Composition and Compression of Trees for Image Descriptions

Polina Kuznetsova,Tamara L Berg,Vicente Ordonez,Yejin Choi

doi:10.1162/tacl_a_00188

Abstract

We present a new tree based approach to composing expressive image descriptions that makes use of naturally occuring web images with captions. We investigate two related tasks: image caption generalization and generation, where the former is an optional subtask of the latter. The high-level idea of our approach is to harvest expressive phrases (as tree fragments) from existing image descriptions, then to compose a new description by selectively combining the extracted (and optionally pruned) tree fragments. Key algorithmic components are tree composition and compression, both integrating tree structure with sequence structure. Our proposed system attains significantly better performance than previous approaches for both image caption generalization and generation. In addition, our work is the first to show the empirical benefit of automatically generalized captions for composing natural image descriptions.

Highlights

The web is increasingly visual, with hundreds of billions of user contributed photographs hosted online
We tap into the last kind of text, using naturally occuring pairs of images with natural language descriptions to compose expressive descriptions for query images via tree composition and compression
We model image caption generalization as sentence compression, in practical applications we may want the outputs of these two tasks to be different

Summary

Introduction

The web is increasingly visual, with hundreds of billions of user contributed photographs hosted online. The second direction, in a complementary avenue to the first, has explored ways to make use of the rich spectrum of visual descriptions contributed by online citizens (Kuznetsova et al, 2012; Feng and Lapata, 2013; Mason, 2013; Ordonez et al, 2011) In these approaches, the set of what can be described can be substantially larger than the set of what can be recognized, where the former is shaped and defined by the data, rather than by humans. The high-level idea of our system is to harvest useful bits of text (as tree fragments) from existing image descriptions using detected visual content similarity, and to compose a new description by selectively combining these extracted (and optionally pruned) tree fragments This overall idea of composition based on extracted phrases is not new in itself (Kuznetsova et al, 2012), we make several technical and empirical contributions. Our work results in an improved image caption corpus with automatic generalization, which is publicly available.

Harvesting Tree Fragments

Tree Composition

ILP Variables

Discussion

Tree Compression

Dynamic Programming

Branch Deletion Probabilities

Experiments

Method

Human Evaluation

Related Work

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2014
Citations: 300	License type: cc-by

R Discovery Prime

R Discovery Prime

TreeTalk: Composition and Compression of Trees for Image Descriptions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Experimental Assessment of Beam Search Algorithm for Improvement in Image Caption Generation
...
-
, et. al. ...
01 Dec 2019
01 Dec 2019

Comparative Evaluation of CNN Architectures for Image Caption Generation
Sulabh Katiyar ... Samir Kumar
International Journal of Advanced Computer Science and Applications | VOL. 11
Sulabh Katiyar, et. al.Sulabh Katiyar ... Samir Kumar
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 11

Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks
Vasu Jindal
-
Vasu JindalVasu Jindal
01 Jan 2018
01 Jan 2018

Image caption generation with dual attention mechanism
Maofu Liu ... Jing Tian
Information Processing and Management | VOL. 57
Maofu Liu, et. al.Maofu Liu ... Jing Tian
12 Dec 2019
Information Processing and Management | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TreeTalk: Composition and Compression of Trees for Image Descriptions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics