Abstract

Contextualized word embeddings, such as ELMo, provide meaningful representations for words and their contexts. They have been shown to have a great impact on downstream applications. However, we observe that the contextualized embeddings of a word might change drastically when its contexts are paraphrased. As these embeddings are over-sensitive to the context, the downstream model may make different predictions when the input sentence is paraphrased. To address this issue, we propose a post-processing approach to retrofit the embedding with paraphrases. Our method learns an orthogonal transformation on the input space of the contextualized word embedding model, which seeks to minimize the variance of word representations on paraphrased contexts. Experiments show that the proposed method significantly improves ELMo on various sentence classification and inference tasks.

Highlights

  • Contextualized word embeddings have shown to be useful for a variety of downstream tasks (Peters et al, 2018, 2017; McCann et al, 2017)

  • (2) ELMo encodes a sentence to a 1,024 dimensional vector by averaging the representations of the top layer. We compare these baselines with four variants of paraphrase-aware retrofitting (PAR) built upon ELMo that trained on different paraphrase corpora

  • Similar to the sentence classification tasks, we apply a Multi-Layer Perceptron (MLP) with the same hyperparameters to conduct the classification

Read more

Summary

Introduction

Contextualized word embeddings have shown to be useful for a variety of downstream tasks (Peters et al, 2018, 2017; McCann et al, 2017). Unlike traditional word embeddings that represent words with fixed vectors, these embedding models encode both words and their contexts and generate context-specific representations. While contextualized embeddings are useful, we observe that a language model-based embedding model, ELMo (Peters et al, 2018), cannot accurately capture the semantic equivalence of contexts. In cases where the contexts of a word have equivalent or similar meanings but are changed in sentence formation or word order, ELMo may assign very different representations to the word.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call