Abstract
Word translation or incorporation of bilingual dictionaries is an important capability that impacts many multilingual language processing tasks. For translation from one language to another language, we either relied on parallel data or bilingual dictionaries. In this paper, we address this problem and generate best cross-lingual word embedding for English-Hindi language pair. Here, we neither use an aligned document or sentence aligned corpus, nor any bilingual dictionary. We are following the assumption of intra lingual similarity distribution that for the most frequent word the distribution graph is similar between Hindi and English corpus and embeddings are isometric. These cross-lingual words embedding can be used for unsupervised neural machine translation and cross-lingual transfer learning. Different retrieval techniques nearest neighbour, inverted nearest neighbours retrieval, inverted Softmax, and cross-lingual word scaling are performed and compared for the bi-lingual embedding of English-Hindi, which is trained for unsupervised and semi-supervised ways by passing seed dictionary. Bi-lingual word embedding is tested on generated English-Hindi dictionary.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.