Abstract

Word alignment is the most important part of the statistical machine translation system. The translation model and the ordinal model in statistical machine translation are constructed on the basis of word alignment results, and the errors in the word alignment stage will continue to these models. In the model, even larger mistakes may be caused in these models due to word alignment errors. The research of word alignment technology provides basic construction for corpus construction, speech recognition, bilingual dictionary compilation and information retrieval in the field of natural language processing. However, the research on word alignment technology in Chinese-Uighur is relatively late. We mainly study the alignment of Chinese and Uighur words based on the sentence level and apply the bilingual corpus filtering method based on the degree of alignment perplexity to the alignment of Chinese-Uighur words. The experimental results show that the method is feasible and has achieved the expected results in the primary stage, and the method provides a certain basis for the follow-up research of word alignment techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call