A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Mikhail Khodak,Brandon Stewart,Tengyu Ma,Nikunj Saunshi,Yingyu Liang,Sanjeev Arora

doi:10.18653/v1/p18-1002

Abstract

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.

Highlights

Distributional word embeddings, which represent the “meaning” of a word via a low-dimensional vector, have been widely applied by many natural language processing (NLP) pipelines and algorithms (Goldberg, 2016)
Novel solution via ala carte embedding, a method which bootstraps existing high-quality word vectors to learn a feature representation in the same semantic space via a linear transformation of the average word embeddings in the feature’s available contexts
An overview of widely used datasets is given by Faruqui and Dyer (2014). None of these datasets can be used directly to measure the effect of word frequency on embedding quality, which would help us understand the data requirements of our approach. We address this issue by introducing the Contextual Rare Words (CRW) dataset, a subset of 562 pairs from the Rare Word (RW) dataset (Luong et al, 2013) supplemented by 255 sentences for each rare word sampled from the Westbury Wikipedia Corpus (WWC) (Shaoul and Westbury, 2010)

Summary

Introduction

Distributional word embeddings, which represent the “meaning” of a word via a low-dimensional vector, have been widely applied by many natural language processing (NLP) pipelines and algorithms (Goldberg, 2016). Novel solution via ala carte embedding, a method which bootstraps existing high-quality word vectors to learn a feature representation in the same semantic space via a linear transformation of the average word embeddings in the feature’s available contexts. This can be seen as a shallow extension of the distributional hypothesis (Harris, 1954), “a feature is characterized by the words in its context,” rather than the computationally more-expensive “a feature is characterized by the features in its context” that has been used implicitly by past work (Rothe and Schutze, 2015; Logeswaran and Lee, 2018)

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 120	License type: cc-by

Similar Papers

Learning Embeddings for Rare Words Leveraging Internet Search Engine and Spatial Location Relationships
...
-
, et. al. ...
22 Jul 2021
22 Jul 2021

Learning Embeddings for Rare Words Leveraging Internet Search Engine and Spatial Location Relationships
Xiaotao Li ... Yawen Niu
-
Xiaotao Li, et. al.Xiaotao Li ... Yawen Niu
01 Jan 2020
01 Jan 2020

Adversarial transfer learning for cross-domain visual recognition
Shanshan Wang ... Jingru Fu
Knowledge-Based Systems | VOL. 204
Shanshan Wang, et. al.Shanshan Wang ... Jingru Fu
15 Jul 2020
Knowledge-Based Systems | VOL. 204

A class-specific copy network for handling the rare word problem in neural machine translation
Feng Wang ... Xiaowei Zhang
-
Feng Wang, et. al.Feng Wang ... Xiaowei Zhang
01 May 2017
01 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Abstract

Highlights

Summary

Talk to us

Similar Papers