Abstract

Sentence-level aligned bilingual parallel corpus is an indispensable and valuable resource for machine translation, translation knowledge acquisition, and bilingual dictionary compilation. Based on the Korean-Chinese parallel corpus, this paper uses a Korean-Chinese sentence alignment algorithm based on character length to realize automatic Korean-Chinese sentence alignment, and proposes a sentence alignment evaluation method. Firstly, preprocess and segment the Korean-Chinese corpus; Secondly, calculate the mean and variance of the corpus distribution based on the Korean Chinese corpus, and use the probability score to find the maximum likelihood probability of the sentence under the framework of dynamic programming; Finally, proposed a sentence alignment judgment method, with the help of Hanjaja tool to convert Sino-Korean words in Korean sentences into Chinese words to form Korean sentences containing Chinese words (abbreviated as C-K sentences), then calculate the Jaccard coefficient between the C-K sentence and the Chinese sentence, by determining the appropriate threshold to automatically determine the alignment or not. Experiments show that the length-based sentence alignment method has a good effect on automatic alignment of Korean-Chinese sentences, and the accuracy of sentence alignment reaches 88.61%. The proposed sentence alignment judgment method is simple and effective.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call