Abstract

In this study, we propose a self-supervised learning method that distils representations of word meaning in context from a pre-trained masked language model. Word representations are the basis for context-aware lexical semantics and unsupervised semantic textual similarity (STS) estimation. A previous study transforms contextualised representations employing static word embeddings to weaken excessive effects of contextual information. In contrast, the proposed method derives representations of word meaning in context while preserving useful context information intact. Specifically, our method learns to combine outputs of different hidden layers using self-attention through self-supervised learning with an automatically generated training corpus. To evaluate the performance of the proposed approach, we performed comparative experiments using a range of benchmark tasks. The results confirm that our representations exhibited a competitive performance compared to that of the state-of-the-art method transforming contextualised representations for the context-aware lexical semantic tasks and outperformed it for STS estimation.

Highlights

  • Word representations are the basis for various natural language processing tasks

  • Large-scale masked language models pre-trained on massive corpora, e.g., bi-directional encoder representations from transformers (BERT) (Devlin et al, 2019), embed both the context and meanmethod that distils representations of word meaning in context from masked language models via self-supervised learning

  • We empirically evaluated whether our method distils representations of word meaning in context from a masked language model using contextaware lexical semantic tasks and semantic textual similarity (STS) estimation tasks

Read more

Summary

Introduction

Word representations are the basis for various natural language processing tasks. They are crucial as a component in context-aware lexical semantics and in the estimation of unsupervised semantic textual similarity (STS) (Arora et al, 2017; Ethayarajh, 2018; Yokoi et al, 2020). Word representations are desired to represent word meaning in context to improve these downstream tasks. Large-scale masked language models pre-trained on massive corpora, e.g., bi-directional encoder representations from transformers (BERT) (Devlin et al, 2019), embed both the context and meanmethod that distils representations of word meaning in context from masked language models via self-supervised learning.. The distillation model is self-supervised using an autoencoder to reconstruct original representations with an automatically generated training corpus. In contrast to the transformation-based approach, our representations preserve useful context information intact

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.