Abstract

Parallel corpora are essential resources for the construction of bilingual term dictionary of historical classics. To obtain large-scale parallel corpora, this paper proposes a sentence alignment method based on mode prediction and term translation pairs. On one hand, the method rebuilds the sentence alignment process according to characteristics of the translation of historical classics, and adds mode prediction into the sentence alignment. On the other hand, due to the lack of bilingual ancient Chinese dictionary, the method exploits the term translation pairs extracted from manually aligned sentence pairs to perform alignment. The method first predicts the alignment mode probability according to the character number, punctuation number and some characters of Chinese sentence, then performs sentence alignment using length alignment probability, term alignment probability and mode probability. Besides, the method selects anchor sentence pairs based on sentence length and predicted mode to prevent the spread of alignment errors. The experiment on ”Shi Ji” demonstrates that mode prediction and term translation pair both enhance the performance of sentence alignment obviously.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.