Abstract

This paper presents a project to align and match bilingual English–Chinese news files downloaded from the China News Service's website. The work involves the alignment of bilingual texts at the sentence and clause levels. It addition, the work also requires matching of files as the English and Chinese news files downloaded from the web do not come in the same sequential order. These news files have their own characteristics and, furthermore, the issue of file-matching has its unique difficulties apart from the known problems of alignment work previously reported in the literature. To align the news files we combine the criteria of ``anchors'' (i.e. unambiguous corresponding text elements) and sentence length. We employ Dynamic Programming first to align at the paragraph level, then to align at the sentence-clause level. The precision and recall of the alignment are satisfactory for free translation texts. To match English and Chinese files, we make use of the anchor alone. In file matching we encounter a ``collision'' problem due to contending matching candidates, and propose a recursive splitting algorithm to resolve the problem. We allow human intervention to improve the precision of matching, and succeeded in achieving 100% precision with a fairly small amount of manual effort. Finally, to determine the various parameters used in aligning and matching, we utilize a Genetic Algorithm software package to obtain their optimized values.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call