Abstract

Lexical entailment (LE) is a fundamental asymmetric lexico-semantic relation, supporting the hierarchies in lexical resources (e.g., WordNet, ConceptNet) and applications like natural language inference and taxonomy induction. Multilingual and cross-lingual NLP applications warrant models for LE detection that go beyond language boundaries. As part of SemEval 2020, we carried out a shared task (Task 2) on multilingual and cross-lingual LE. The shared task spans three dimensions: (1) monolingual vs. cross-lingual LE, (2) binary vs. graded LE, and (3) a set of 6 diverse languages (and 15 corresponding language pairs). We offered two different evaluation tracks: (a) Dist: for unsupervised, fully distributional models that capture LE solely on the basis of unannotated corpora, and (b) Any: for externally informed models, allowed to leverage any resources, including lexico-semantic networks (e.g., WordNet or BabelNet). In the Any track, we recieved runs that push state-of-the-art across all languages and language pairs, for both binary LE detection and graded LE prediction.

Highlights

  • Lexical entailment (LE; hyponymy-hypernymy or is-a relation) is a core asymmetric lexico-semantic relation (Collins and Quillian, 1972; Beckwith et al, 1991) and a crucial building block of lexico-semantic networks and knowledge bases such as WordNet (Fellbaum, 1998), BabelNet (Navigli and Ponzetto, 2012) or ConceptNet (Speer et al, 2017)

  • We have not seen any encouraging results in the Dist track: the two runs submitted for the binary LE detection fail to outperform the simple cosine similarity baseline;7 and we have not received any submissions for the graded LE prediction for the Dist track

  • The results achieved by SHIKEBLCU in the graded monolingual LE tasks (Table 4) are even more encouraging and truly push the state-of-the-art in graded multilingual LE prediction – the improvements over GLEN are ≥ 20 Spearman correlation points for low-resource languages in our evaluation (TR, HR, SQ)

Read more

Summary

Introduction

Lexical entailment (LE; hyponymy-hypernymy or is-a relation) is a core asymmetric lexico-semantic relation (Collins and Quillian, 1972; Beckwith et al, 1991) and a crucial building block of lexico-semantic networks and knowledge bases such as WordNet (Fellbaum, 1998), BabelNet (Navigli and Ponzetto, 2012) or ConceptNet (Speer et al, 2017). We first created monolingual HyperLex datasets in three target languages: German (DE), Italian (IT), and Croatian (HR), as described in (Vulicet al., 2019b) For this shared task, we repeated the procedure for two more languages: Turkish (TR), and our surprise test language – Albanian (SQ). The translation approach has been validated in previous work for creating multilingual semantic similarity datasets (Leviant and Reichart, 2015; Camacho-Collados et al, 2017) Most importantly, it allows for the automatic construction of cross-lingual graded LE datasets. We constructed the cross-lingual datasets automatically, leveraging word pair translations and scores in five target languages. We retained only cross-lingual pairs for which the corresponding monolingual scores differ by ≤ 1.0: this heuristic (Camacho-Collados et al, 2017) mitigates the undesirable inter-language semantic shift. For the binary LE detection subtasks we resorted to the standard F1 measure

Participating Systems
Baselines
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call