Abstract
In machine word translation, Nearest Neighbor (NN) retrieval is able to search the best-k translation candidates as linguistic labels of a source query from a unified multilingual semantic feature space. However, NN is polluted by hubs in the high-dimensional feature space. Many proposed approaches remove hubs in the list of translation candidates to relieve this problem. But those approaches to eliminating hubs are flawed because they also have corresponding translations. To address this issue, we propose a novel Reconstructed Similarity (RS) retrieval for the neural machine word translation model to mitigate the hubness problem regardless of whether it is a hub. Different from previous work, RS reduces the impact of hubness pollution in dense and high-dimensional space and allows the hubs to have the same probability as the target candidates without being inappropriately excluded. In addition, RS improves the quality of bilingual dictionaries by measuring the bilateral similarity of the bilingual and monolingual distance of each of the source query embeddings. Additionally, to model the unsupervised machine word translation, we introduce Generative Adversarial Nets (GANs) to map the source and target word distribution into a shared semantic space. We also construct a tiny GAN topology for neural machine word translation, which is at least 52 × faster than previous GAN-based models. To further align cross-lingual embedding distributions, we provide orthogonal Procrustes mapping, global-awareness of the transformation matrix and rescaling of the target embeddings as flexible and optional multirefinements. The results show that our model outperforms the state-of-the-art by nearly 4% in distant languages such as English to Finnish. Compared with a precision@1 of 47.00% from English to Finnish, our model obtains a precision@1 of 47.53% and achieves state-of-the-art results in a fully unsupervised form. Moreover, our model achieves competitive results in the shortest time among GAN-based models, which easily trade off between speed and accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.