Towards a Universal Semantic Dictionary

Maria Jose Castro-Bleda,Gábor Borbély,Eszter Iklódi,Gábor Recski

doi:10.3390/app9194060

Maria Jose Castro-Bleda, Gábor Borbély + Show 2 more

Open Access

https://doi.org/10.3390/app9194060

Copy DOI

Abstract

A novel method for finding linear mappings among word embeddings for several languages, taking as pivot a shared, multilingual embedding space, is proposed in this paper. Previous approaches learned translation matrices between two specific languages, while this method learns translation matrices between a given language and a shared, multilingual space. The system was first trained on bilingual, and later on multilingual corpora as well. In the first case, two different training data were applied: Dinu’s English–Italian benchmark data, and English–Italian translation pairs extracted from the PanLex database. In the second case, only the PanLex database was used. The system performs on English–Italian languages with the best setting significantly better than the baseline system given by Mikolov, and it provides a comparable performance with more sophisticated systems. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number of languages.

Highlights

Computer-driven natural language processing plays an increasingly important role in our everyday life
State-of-the-art systems represent word meaning with high dimensional vectors, known as word embeddings
One might ask if the structure of the different embeddings, i.e., different meaning representations, are universal among all human languages

Summary

Introduction

Computer-driven natural language processing plays an increasingly important role in our everyday life. They found that these graphs reflected a certain structure of meaning with respect to the languages they were built of They concluded that the structural properties of these graphs are consistent across different language groups, and largely independent of geography, environment, and the presence or absence of literary traditions. Such findings led to a new research direction within the field of computational semantics, which focuses on the construction of universal meaning representations, most of the times in the form of cross-lingual word embedding models [2]. We do not chose any language to serve as a pivot space, but instead, we create a language independent space, i.e., a universal space, to where we map all languages

Methods

Results

Conclusion