Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision

Meng Zhang,Haoruo Peng,Yang Liu,Maosong Sun,Huanbo Luan

doi:10.1609/aaai.v31i1.10988

Abstract

Building bilingual lexica from non-parallel data is a long-standing natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e.g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is however unreliable when there are only a limited number of seeds, which is a reasonable setting for resource-scarce languages. We tackle the limitation by introducing a novel matching mechanism into bilingual word representation learning. It captures extra translation pairs exposed by the seeds to incrementally improve the bilingual word embeddings. In our experiments, we find the matching mechanism to substantially improve the quality of the bilingual vector space, which in turn allows us to induce better bilingual lexica with seeds as few as 10.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Feb 12, 2017
Citations: 20

Similar Papers

Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction
Ivan Vulić ... Marie-Francine Moens
-
Ivan Vulić, et. al.Ivan Vulić ... Marie-Francine Moens
01 Jan 2015
01 Jan 2015

Bilingual Distributed Word Representations from Document-Aligned Comparable Data
Ivan Vulić ... Marie-Francine Moens
Journal of Artificial Intelligence Research | VOL. 55
Ivan Vulić, et. al.Ivan Vulić ... Marie-Francine Moens
12 Apr 2016
Journal of Artificial Intelligence Research | VOL. 55

Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora
...
-
, et. al. ...
21 Oct 2021
21 Oct 2021

Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora
Takashi Wada ... Timothy Baldwin
-
Takashi Wada, et. al.Takashi Wada ... Timothy Baldwin
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence