Bilingual embeddings with random walks over multilingual wordnets

Josu Goikoetxea,Aitor Soroa,Eneko Agirre

doi:10.1016/j.knosys.2018.03.017

Abstract

Bilingual word embeddings represent words of two languages in the same space, and allow to transfer knowledge from one language to the other without machine translation. The main approach is to train monolingual embeddings first and then map them using bilingual dictionaries. In this work, we present a novel method to learn bilingual embeddings based on multilingual knowledge bases (KB) such as WordNet. Our method extracts bilingual information from multilingual wordnets via random walks and learns a joint embedding space in one go. We further reinforce cross-lingual equivalence adding bilingual constraints in the loss function of the popular Skip-gram model. Our experiments on twelve cross-lingual word similarity and relatedness datasets in six language pairs covering four languages show that: 1) our method outperforms the state-of-the-art mapping method using dictionaries; 2) multilingual wordnets on their own improve over text-based systems in similarity datasets; 3) the combination of wordnet-generated information and text is key for good results. Our method can be applied to richer KBs like DBpedia or BabelNet, and can be easily extended to multilingual embeddings. All our software and resources are open source.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bilingual embeddings with random walks over multilingual wordnets

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Mar 11, 2018
Citations: 19

Similar Papers

Random Walks and Neural Network Language Models on Knowledge Bases
Josu Goikoetxea ... Eneko Agirre
-
Josu Goikoetxea, et. al.Josu Goikoetxea ... Eneko Agirre
01 Jan 2015
01 Jan 2015

Prix-LM: Pretraining for Multilingual Knowledge Base Construction
...
-
, et. al. ...
07 May 2022
07 May 2022

Prix-LM: Pretraining for Multilingual Knowledge Base Construction
Wenxuan Zhou ... Ivan Vulić
-
Wenxuan Zhou, et. al.Wenxuan Zhou ... Ivan Vulić
01 Jan 2021
01 Jan 2021

Antonyms are similar: Towards paradigmatic association approach to rating similarity in SimLex-999 and WordSim-353
Tomáš Kliegr ... Ondřej Zamazal
Data & Knowledge Engineering | VOL. 115
Tomáš Kliegr, et. al.Tomáš Kliegr ... Ondřej Zamazal
11 Apr 2018
Antonyms are similar: Towards paradigmatic association approach to rating similarity in SimLex-999 and WordSim-353
Tomáš Kliegr ... Ondřej Zamazal

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bilingual embeddings with random walks over multilingual wordnets

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems