Bilingual Sentence Alignment: Balancing Robustness and Accuracy

Michel Simard,Pierre Plamondon

doi:10.1023/a:1008010319408

Abstract

Sentence alignment is the problem of making explicit the relations that exist between the sentences of two texts that are known to be mutual translations. Automatic sentence-alignment methods typically face two kinds of difficulties. First, there is the question of robustness. In real life, discrepancies between a source text and its translation are quite common: differences in layout, omissions, inversions, etc. Sentence-alignment programs must be ready to deal with such phenomena. Then, there is the question of accuracy. Even when translations are "clean", alignment is still not a trivial matter: some decisions are hard to make, even for humans. We report here on the current state of our ongoing efforts to produce a sentence-alignment program that is both robust and accurate. The method that we propose relies on two new alignment engines: one that produces highly reliable and robust character-level alignments, and one that relies on statistical lexical knowledge to produce ac- curate mappings. Experimental results are presented which demonstrate the method's effectiveness, and highlight where problems remain to be solved.

Full Text