Abstract

ABSTRACT The paper proposes a solution to the problem of unknown words for neural machine translation (NMT). The proposed solution is shown by the example of NMT of the Kazakh-English language pair. The novelty of the proposed technology for solving the problem of unknown words in the NMT of the Kazakh language is an algorithm proposed for searching for unknown words in the dictionary of a trained model of NMT and using the dictionary of synonyms of the Kazakh to replace an unknown word with a word that is close in meaning. A dictionary of synonyms is used to search for words that are similar in meaning to the unknown words, which was defined. Moreover, the found synonyms are checked for the presence in the vocabulary of a trained model. After that, a new translation of the edited sentence of the source language is performed. The base of words-synonyms of the Kazakh language is collected. Software solutions to the unknown word problem have been developed in the Python. The proposed technology solution to the problem of unknown words was tested on the two parallel Kazakh-English corpus in both variants: baseline NMT and NMT with using of the proposed technology.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call