Abstract

This paper reports preliminary experiments aiming at verifying the conjecture that semantic compositionality is a general process irrespective of the underlying modality. In particular, we model compositionality of an attribute with an object in the visual modality as done in the case of an adjective with a noun in the linguistic modality. Our experiments show that the concept topologies in the two modalities share similarities, results that strengthen our conjecture. 1 Language and Vision Recently, fields like computational linguistics and computer vision have converged to a common way of capturing and representing the linguistic and visual information of atomic concepts, through vector space models. At the same time, advances in computational semantics have lead to effective and linguistically inspired approaches of extending such methods from single concepts to arbitrary linguistic units (e.g. phrases), through means of vector-based semantic composition (Mitchell and Lapata, 2010). Compositionality is not to be considered only an important component from a linguistic perspective, but also from a cognitive perspective and there has been efforts to validate it as a general cognitive process. However, in computer vision so far compositionality has received limited attention. Thus, in this work, we study the phenomenon of visual compositionality and we complement limited previous literature that has focused on event compositionality (St¨

Highlights

  • Compositionality is not to be considered only an important component from a linguistic perspective, and from a cognitive perspective and there has been efforts to validate it as a general cognitive process

  • As our source of inspiration regarding the type of compositionality, we use the Lexical Functional model (LF) (Baroni and Zamparelli, 2010), under which adjectives, in linguistic compositionality, are represented as linear functions

  • We propose to import the LF method in the visual modality, aiming at developing a Visual Compositional Model

Read more

Summary

Introduction

Compositionality is not to be considered only an important component from a linguistic perspective, and from a cognitive perspective and there has been efforts to validate it as a general cognitive process. Our work consists of learning vector representations of attribute-object (e.g., “red car”, “cute dog” etc.) and objects (e.g., “car”, “dog”, “truck”, “cat” etc.) and by using those compute the representation of new objects having similar attributes (“red truck”, “cute cat” etc.). This question has both theoretical and applied impact. To the case of linguistic compositionality, each attribute function faVttr is induced from image-harvested vector representations of attribute-object vi ∈ Vphrase and object vj ∈ Vobject, e.g. for training the function frVed the following data can be used (vred car, vcar), (vred flag, vflag),

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.