Abstract

Researchers suggest unsupervised English machine translation to address the absence of parallel corpus in English translation. Unsupervised pretraining techniques, denoising autoencoders, back translation, and shared latent representation mechanisms are used to simulate the translation task using just monolingual corpora. This paper uses pseudo-parallel data to construct unsupervised neural machine translation (NMT) and dissimilar language pair analysis. This paper firstly analyzes the low performance of unsupervised translation on dissimilar language pairs from three aspects: bilingual word embedding quality, shared words, and word order. And artificial shared word replacement and preordering strategies are proposed to increase the shared words between dissimilar language pairs and reduce the difference in their syntactic structure, thereby improving the translation performance on dissimilar language pairs. The denoising autoencoder and shared latent representation mechanism in unsupervised English machine translation are only required in the early stage of training, and learning the shared latent representation limits the further improvement of performance in different directions. While training the denoising autoencoder by repeatedly altering the training data slows down the convergence of the model, this is especially true for divergent languages. This paper presents an unsupervised NMT model based on pseudo-parallel data to address this issue. It trains two standard supervised neural machine translation models using the pseudo-parallel corpus generated by the unsupervised neural machine translation system, which enhances translation performance and speeds convergence. Finally, the English intelligent translation model is deployed in the wireless network server, and users can access it through the wireless network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call