From Visual Attributes to Adjectives through Decompositional Distributional Semantics

Angeliki Lazaridou,Marco Baroni,Georgiana Dinu,Adam Liska

doi:10.1162/tacl_a_00132

Angeliki Lazaridou, Marco Baroni + Show 2 more

Open Access

https://doi.org/10.1162/tacl_a_00132

Copy DOI

Abstract

As automated image analysis progresses, there is increasing interest in richer linguistic annotation of pictures, with attributes of objects (e.g., furry, brown…) attracting most attention. By building on the recent “zero-shot learning” approach, and paying attention to the linguistic nature of attributes as noun modifiers, and specifically adjectives, we show that it is possible to tag images with attribute-denoting adjectives even when no training data containing the relevant annotation are available. Our approach relies on two key observations. First, objects can be seen as bundles of attributes, typically expressed as adjectival modifiers (a dog is something furry, brown, etc.), and thus a function trained to map visual representations of objects to nominal labels can implicitly learn to map attributes to adjectives. Second, objects and attributes come together in pictures (the same thing is a dog and it is brown). We can thus achieve better attribute (and object) label retrieval by treating images as “visual phrases”, and decomposing their linguistic representation into an attribute-denoting adjective and an object-denoting noun. Our approach performs comparably to a method exploiting manual attribute annotation, it out-performs various competitive alternatives in both attribute and object annotation, and it automatically constructs attribute-centric representations that significantly improve performance in supervised object recognition.

Highlights

As the quality of image analysis algorithms improves, there is increasing interest in annotating images with linguistic descriptions ranging from single words describing the depicted objects and their properties (Farhadi et al, 2009; Lampert et al, 2009) to richer expressions such as full-fledged image captions (Kulkarni et al, 2011; Mitchell et al, 2012)
Russakovsky and Fei-Fei (2010) trained separate SVM classifiers for each attribute in the evaluation dataset in a cross-validation setting. This fully supervised approach can be seen as an ambitious upper bound for zero-shot learning, and we directly compare our performance to theirs using their figure of merit, namely area under the ROC curve (AUC), which is commonly used for binary classification problems
The combined FUSED approach outperforms both representations by a large margin (35.81%), confirming that the linguistically-enriched information brought by DEC is to a certain extent complementary to the lowerlevel visual evidence directly exploited by PHOW

Summary

Introduction

As the quality of image analysis algorithms improves, there is increasing interest in annotating images with linguistic descriptions ranging from single words describing the depicted objects and their properties (Farhadi et al, 2009; Lampert et al, 2009) to richer expressions such as full-fledged image captions (Kulkarni et al, 2011; Mitchell et al, 2012). While the correlation is smaller than for object-noun data (0.23), we conjecture it is sufficient for zero-shot learning of attributes We will confirm this by testing a cross-modal projection function from attributes, such as colors and shapes, onto adjectives in linguistic semantic space, trained on pre-existing annotated datasets covering less than 100 attributes (Experiment 1). We turn to recent work in distributional semantics defining a vector decomposition framework (Dinu and Baroni, 2014) which, given a vector encoding the meaning of a phrase, aims at decoupling its constituents, producing vectors that can be matched to a sequence of words best capturing the semantics of the phrase We adopt this framework to decompose image representations projected onto linguistic space into an adjective-noun phrase. In addition to contributions to image annotation, our work suggests new test beds for distributional semantic representations of nouns and associated adjectives, and provides more in-depth evidence of the potential of the decompositional approach

Cross-Modal Mapping

Decomposition

Representational Spaces

Evaluation Dataset

Experiment 1

Cross-modal training and evaluation

Results and discussion

Experiment 2

Cross-modal training

Object-agnostic models

Object-informed models

Results

Using DEC for attribute-based object classification

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2015
Citations: 48	License type: cc-by

R Discovery Prime

R Discovery Prime

From Visual Attributes to Adjectives through Decompositional Distributional Semantics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example
Xiaodong Yu ... Yiannis Aloimonos
-
Xiaodong Yu, et. al.Xiaodong Yu ... Yiannis Aloimonos
01 Jan 2009
01 Jan 2009

Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations
Jing-Xuan Zhang ... Li-Rong Dai
IEEE/ACM transactions on audio, speech, and language processing | VOL. 28
Jing-Xuan Zhang, et. al.Jing-Xuan Zhang ... Li-Rong Dai
01 Jan 2020
IEEE/ACM transactions on audio, speech, and language processing | VOL. 28

TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning
Xin Wang ... Fisher Yu
-
Xin Wang, et. al.Xin Wang ... Fisher Yu
01 Jun 2019
01 Jun 2019

End-to-End Supervised Zero-Shot Learning with Meta-Learning Strategy
Xiaofeng Xu ... Xingyu Lu
-
Xiaofeng Xu, et. al.Xiaofeng Xu ... Xingyu Lu
10 Dec 2021
10 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

From Visual Attributes to Adjectives through Decompositional Distributional Semantics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics