How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

Steffen Eger,Iryna Gurevych,Johannes Daxenberger

doi:10.18653/v1/2020.conll-1.8

Abstract

Sentence encoders map sentences to real valued vectors for use in downstream applications. To peek into these representations—e.g., to increase interpretability of their results—probing tasks have been designed which query them for linguistic knowledge. However, designing probing tasks for lesser-resourced languages is tricky, because these often lack largescale annotated data or (high-quality) dependency parsers as a prerequisite of probing task design in English. To investigate how to probe sentence embeddings in such cases, we investigate sensitivity of probing task results to structural design choices, conducting the first such large scale study. We show that design choices like size of the annotated probing dataset and type of classifier used for evaluation do (sometimes substantially) influence probing outcomes. We then probe embeddings in a multilingual setup with design choices that lie in a ‘stable region’, as we identify for English, and find that results on English do not transfer to other languages. Fairer and more comprehensive sentence-level probing evaluation should thus be carried out on multiple languages in the future.

Highlights

Sentence embeddings (a.k.a. sentence encoders) have become ubiquitous in NLP (Kiros et al, 2015; Conneau et al, 2017), extending the concept of word embeddings to the sentence level
In the context of recent efforts to open the black box of deep learning models and representations (Linzen et al, 2019), it has become fashionable to probe sentence embeddings for the linguistic information signals they contain (Perone et al, 2018), as this may not be clear from their performances in downstream tasks
SV-Agree correlates positively with Text Retrieval Conference (TREC) and sentiment in all languages but en. This might be because determining the agreement of subject and verb is more grammatically complex in the other languages compared to English, and storing an adequate amount of grammatical information may be beneficial for certain downstream tasks

Summary

Introduction

Sentence embeddings (a.k.a. sentence encoders) have become ubiquitous in NLP (Kiros et al, 2015; Conneau et al, 2017), extending the concept of word embeddings to the sentence level. In the context of recent efforts to open the black box of deep learning models and representations (Linzen et al, 2019), it has become fashionable to probe sentence embeddings for the linguistic information signals they contain (Perone et al, 2018), as this may not be clear from their performances in downstream tasks. Such probes are linguistic micro tasks—like detecting the length of a sentence or its dependency tree depth—that have to be solved by High Mid Low. LR (A,B,C) (A,C,B) (A,B,C) classifier MLP. One of them is the size of training data for probing tasks, as this training data typically needs to be (automatically or manually) annotated, an inherent obstacle in low-resource settings.

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 32	License type: cc-by

Similar Papers

Linking service structural design to service profitability: a U.S. airline industry study
David West ... Scott Dellana
Operations Management Research | VOL. 9
David West, et. al.David West ... Scott Dellana
10 May 2016
Operations Management Research | VOL. 9

Parametric study on the decarbonization potential of structural system and concrete mix design choices for mid-rise concrete buildings
Hisham Hafez ... Nikola Tošić
Materials and Structures | VOL. 57
Hisham Hafez, et. al.Hisham Hafez ... Nikola Tošić
01 May 2024
Materials and Structures | VOL. 57

Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT
Zhiyong Wu ... Qun Liu
-
Zhiyong Wu, et. al.Zhiyong Wu ... Qun Liu
01 Jan 2020
01 Jan 2020

LINSPECTOR: Multilingual Probing Tasks for Word Representations
Gözde Gül Şahin ... Clara Vania
Computational Linguistics | VOL. 46
Gözde Gül Şahin, et. al.Gözde Gül Şahin ... Clara Vania
01 Jun 2020
Computational Linguistics | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers