Abstract

The paper proposes a solution to the problem of unknown words for neural machine translation. The proposed solution is shown by the example of a neural machine translation of a Kazakh-English language pair. The novelty of the proposed technology for solving the problem of unknown words in the neural machine translation of the Kazakh language is the proposed algorithm for searching of unknown words in the vocabulary of a trained model of neural machine translation using the dictionary of synonyms of the Kazakh language. A dictionary of synonyms is used to search for words that are similar in meaning to the unknown words, which was defined. Moreover, the found synonyms are checked for the presence in the vocabulary of a trained model of neural machine translation. After that, a new translation of the edited sentence of the source language is performed. The base of words-synonyms of the Kazakh language is collected. The total number of synonymous words collected is 1995. Software solutions to the unknown word problem have been developed in the python programming language. The proposed technology solution to the problem of unknown words for neural machine translation was tested on the two source parallel Kazakh-English corpus (KAZNU Kazakh-English parallel corpus and WMT19 Kazakh-English parallel corpus) in both variants: baseline and with using of the proposed technology.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.