Abstract

Projection-based methods for generating high-quality Cross-Lingual Embeddings (CLEs) have shown state-of-the-art performance in many multilingual applications. Supervised methods that rely on character-level information or unsupervised methods that need only monolingual information are both popular and have their pros and cons. However, there are still problems in terms of the quality of monolingual word embedding spaces and the generation of the seed dictionaries. In this work, we aim to generate effective CLEs with auxiliary Topic Models. We utilize both monolingual and bilingual topic models in the procedure of generating monolingual embedding spaces and seed dictionaries for projection. We present a comprehensive evaluation of our proposed model through the means of bilingual lexicon extraction, cross-lingual semantic word similarity and cross-lingual document classification tasks. We show that our proposed model outperforms existing supervised and unsupervised CLE models built on basic monolingual embedding spaces and seed dictionaries. It also exceeds CLE models generated from representative monolingual topical word embeddings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call