Cross-Lingual Embeddings Using a Temporally Aligned Comparable Corpus: A Case Study For Manipuri–English

Lenin Laitonjam,Sanasam Ranbir Singh

doi:10.1142/s2717554522500084

Abstract

Cross-lingual embeddings facilitate cross-language learning, bridging the gap between rich-resource and low-resource languages. This study provides and assesses unsupervised cross-lingual embeddings generation methods for the low-resource Manipuri–English language pair. Manipuri is a resource-poor language spoken in India’s northeastern regions. The embeddings are evaluated on the language pair bilingual dictionary induction task. Furthermore, we propose a method to improve the cross-lingual embeddings by exploiting a temporally aligned comparable corpus. Lack of supervision has always been an issue for learning models, especially in low-resource settings. The proposed method takes advantage of the temporal alignments and provides the much-needed supervision to improve the alignment between Manipuri and English language pair. We observe that the proposed model consistently outperforms all the corresponding baselines from various experimental results.

Full Text