Abstract

Sentence encoders map sentences to real valued vectors for use in downstream applications. To peek into these representations—e.g., to increase interpretability of their results—probing tasks have been designed which query them for linguistic knowledge. However, designing probing tasks for lesser-resourced languages is tricky, because these often lack largescale annotated data or (high-quality) dependency parsers as a prerequisite of probing task design in English. To investigate how to probe sentence embeddings in such cases, we investigate sensitivity of probing task results to structural design choices, conducting the first such large scale study. We show that design choices like size of the annotated probing dataset and type of classifier used for evaluation do (sometimes substantially) influence probing outcomes. We then probe embeddings in a multilingual setup with design choices that lie in a ‘stable region’, as we identify for English, and find that results on English do not transfer to other languages. Fairer and more comprehensive sentence-level probing evaluation should thus be carried out on multiple languages in the future.

Highlights

  • Sentence embeddings (a.k.a. sentence encoders) have become ubiquitous in NLP (Kiros et al, 2015; Conneau et al, 2017), extending the concept of word embeddings to the sentence level

  • In the context of recent efforts to open the black box of deep learning models and representations (Linzen et al, 2019), it has become fashionable to probe sentence embeddings for the linguistic information signals they contain (Perone et al, 2018), as this may not be clear from their performances in downstream tasks

  • SV-Agree correlates positively with Text Retrieval Conference (TREC) and sentiment in all languages but en. This might be because determining the agreement of subject and verb is more grammatically complex in the other languages compared to English, and storing an adequate amount of grammatical information may be beneficial for certain downstream tasks

Read more

Summary

Introduction

Sentence embeddings (a.k.a. sentence encoders) have become ubiquitous in NLP (Kiros et al, 2015; Conneau et al, 2017), extending the concept of word embeddings to the sentence level. In the context of recent efforts to open the black box of deep learning models and representations (Linzen et al, 2019), it has become fashionable to probe sentence embeddings for the linguistic information signals they contain (Perone et al, 2018), as this may not be clear from their performances in downstream tasks. Such probes are linguistic micro tasks—like detecting the length of a sentence or its dependency tree depth—that have to be solved by High Mid Low. LR (A,B,C) (A,C,B) (A,B,C) classifier MLP. One of them is the size of training data for probing tasks, as this training data typically needs to be (automatically or manually) annotated, an inherent obstacle in low-resource settings.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call