Abstract

Recent works have managed to learn cross-lingual word embeddings (CLWEs) in an unsupervised manner. As a prominent unsupervised model, generative adversarial networks (GANs) have been heavily studied for unsupervised CLWEs learning by aligning the embedding spaces of different languages. Due to disturbing the embedding distribution, the embeddings of low-frequency words (LFEs) are usually treated as noises in the alignment process. To alleviate the impact of LFEs, existing GANs based models utilized a heuristic rule to aggressively sample the embeddings of high-frequency words (HFEs). However, such sampling rule lacks of theoretical support. In this paper, we propose a novel GANs based model to learn cross-lingual word embeddings without any parallel resource. To address the noise problem caused by the LFEs, some perturbations are injected into the LFEs for offsetting the distribution disturbance. In addition, a modified framework based on Cramer GAN is designed to train the perturbed LFEs and the HFEs jointly. Empirical evaluation on bilingual lexicon induction demonstrates that the proposed model outperforms the state-of-the-art GANs based model in several language pairs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.