Evaluation of 2-way Iraqi Arabic–English speech translation systems using automated metrics

Sherri Condon,Mark Arehart,Gregory Sanders,Dan Parvaz,John Aberdeen,Christy Doran

doi:10.1007/s10590-011-9105-x

Abstract

The Defense Advanced Research Projects Agency (DARPA) Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program ( http://1.usa.gov/transtac ) faced many challenges in applying automated measures of translation quality to Iraqi Arabic---English speech translation dialogues. Features of speech data in general and of Iraqi Arabic data in particular undermine basic assumptions of automated measures that depend on matching system outputs to reference translations. These features are described along with the challenges they present for evaluating machine translation quality using automated metrics. We show that scores for translation into Iraqi Arabic exhibit higher correlations with human judgments when they are computed from normalized system outputs and reference translations. Orthographic normalization, lexical normalization, and operations involving light stemming resulted in higher correlations with human judgments.

Full Text