ABSTRACT The fast growth of communication technology has brought nations and their cultures closer together, and the demand for cross-language communication has risen tremendously. There is a different learning method to connect the source language to the target language in which unsupervised learning is a blessing for low-resource languages. The unsupervised machine translation is always problematic to those languages which are morphologically rich and low resources languages. Morphologically rich and low-resource language does not provide good results in machine translation if the translation is from morphologically less complex language to morphologically more complex languages. In this paper, we have improved the unsupervised neural machine translation by tackling the ambiguity problem and the quality of pseudo-parallel sentence pairs generated through back-translation for morphologically rich languages. The ambiguity problem is solved by taking the cross-lingual sense embedding at the source side instead of cross-lingual word embedding. By giving more weight to better pseudo-parallel sentence pairs in the back-translation step, the quality of pseudo-parallel sentences is increased. Different evaluation metrics have been used to check the robustness of the model and compared with different baseline models. The experiment is performed on different morphologically rich languages English-Hindi, English-Tamil, English-Telegu, and one low-resource endangered kangri language.
Read full abstract