Abstract

A novel method for finding linear mappings among word embeddings for several languages, taking as pivot a shared, multilingual embedding space, is proposed in this paper. Previous approaches learned translation matrices between two specific languages, while this method learns translation matrices between a given language and a shared, multilingual space. The system was first trained on bilingual, and later on multilingual corpora as well. In the first case, two different training data were applied: Dinu’s English–Italian benchmark data, and English–Italian translation pairs extracted from the PanLex database. In the second case, only the PanLex database was used. The system performs on English–Italian languages with the best setting significantly better than the baseline system given by Mikolov, and it provides a comparable performance with more sophisticated systems. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number of languages.

Highlights

  • Computer-driven natural language processing plays an increasingly important role in our everyday life

  • State-of-the-art systems represent word meaning with high dimensional vectors, known as word embeddings

  • One might ask if the structure of the different embeddings, i.e., different meaning representations, are universal among all human languages

Read more

Summary

Introduction

Computer-driven natural language processing plays an increasingly important role in our everyday life. They found that these graphs reflected a certain structure of meaning with respect to the languages they were built of They concluded that the structural properties of these graphs are consistent across different language groups, and largely independent of geography, environment, and the presence or absence of literary traditions. Such findings led to a new research direction within the field of computational semantics, which focuses on the construction of universal meaning representations, most of the times in the form of cross-lingual word embedding models [2]. We do not chose any language to serve as a pivot space, but instead, we create a language independent space, i.e., a universal space, to where we map all languages

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.