Abstract

The unknown words in neural machine translation (NMT) may undermine the integrity of sentence structure, increase ambiguity and have adverse effect on the translation. In order to solve this problem, we propose a method of processing unknown words in NMT based on integrating syntactic structure and semantic concept. Firstly, the semantic concept network is used to construct the set of in-vocabulary synonyms corresponding to the unknown words. Secondly, a semantic similarity calculation method based on the syntactic structure and semantic concept is proposed. The best substitute is selected from the set of in-vocabulary synonyms by calculating the semantic similarity between the unknown words and their candidate substitutes. English-Chinese translation experiments demonstrate that this method can maintain the semantic integrity of the source language sentences. Meanwhile, in performance, our proposed method can obtain an improvement by 2.9 BLEU points when compared with the conventional NMT method, and the method can also achieve an improvement by 0.95 BLEU points when compared with the traditional method of positioning the UNK character based on word alignment information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call