A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images

Melissa Ailem,Bowen Zhang,Pascal Denis,Fei Sha,Aurelien Bellet

doi:10.18653/v1/d18-1177

Abstract

Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features extracted from convolutional neural networks), or use the latter to bias the learning of textual word embeddings. In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. Extensive experimental studies validate the proposed model. Concretely, on the tasks of assessing pairwise word similarity and image/caption retrieval, our approach attains equally competitive or stronger results when compared to other state-of-the-art multimodal models.

Highlights

Continuous-valued vector representation of words has been one of the key components in neural architectures for natural language processing (Mikolov et al, 2013; Pennington et al, 2014; Levy and Goldberg, 2014)
We develop a new model which jointly learns word embeddings from text and extracts latent visual information, from pre-computed visual features, that could supplement the linguistic embeddings in modeling the co-occurrence of words Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1478–1487 Brussels, Belgium, October 31 - November 4, 2018. c 2018 Association for Computational Linguistics and their contexts in a corpus
We propose PIXIE, a novel probabilistic model joining textual and perceptual information to infer multimodal word embeddings

Summary

Introduction

Continuous-valued vector representation of words has been one of the key components in neural architectures for natural language processing (Mikolov et al, 2013; Pennington et al, 2014; Levy and Goldberg, 2014). The embeddings produced by such models do not necessarily reflect all inherent aspects of human semantic knowledge, such as the perceptual aspect (Feng and Lapata, 2010) This has motivated many researchers to explore different ways to infuse visual information, often represented in the form of pre-computed visual features, into word embeddings (Kiela and Bottou, 2014; Silberer et al, 2017; Collell et al, 2017; Lazaridou et al, 2015). The extracted visual factors can improve the modeling of word-context co-occurrences in text data Another appealing property of our model is its natural ability to propagate perceptual information to the embeddings of words lacking visual features (e.g., abstract words) during learning. We show its matching or stronger performance when compared to other state-of-the-art approaches for learning multimodal embeddings

Setup and Background

Joint Visual and Text Modeling

Approximate Inference and Learning

Related Work

Experiments

Task 1

Main results

Qualitative analysis

Task 2

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 36	License type: cc-by

Similar Papers

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Fine-Tuning Word Embeddings for Aspect-Based Sentiment Analysis
Duc-Hong Pham ... Anh-Cuong Le
-
Duc-Hong Pham, et. al.Duc-Hong Pham ... Anh-Cuong Le
01 Jan 2017
01 Jan 2017

Impact of word embedding models on text analytics in deep learning environment: a review.
Deepak Suresh Asudani ... Naresh Kumar Nagwani
Artificial Intelligence Review | VOL. 56
Deepak Suresh Asudani, et. al.Deepak Suresh Asudani ... Naresh Kumar Nagwani
22 Feb 2023
Artificial Intelligence Review | VOL. 56

A novel word embedding learning model using the dissociation between nouns and verbs
Baotian Hu ... Longbiao Kang
Neurocomputing | VOL. 171
Baotian Hu, et. al.Baotian Hu ... Longbiao Kang
28 Jul 2015
Neurocomputing | VOL. 171

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images

Abstract

Highlights

Summary

Talk to us

Similar Papers