Some insights from translating conversational telephone speech

Gaurav Kumar,Sanjeev Khudanpur,Matt Post,Daniel Povey

doi:10.1109/icassp.2014.6854197

Abstract

We report insights from translating Spanish conversational telephone speech into English text by cascading an automatic speech recognition (ASR) system with a statistical machine translation (SMT) system. The key new insight is that the informal register of conversational speech is a greater challenge for ASR than for SMT: the BLEU score for translating the reference transcript is 64%, but drops to 32% for translating automatic transcripts, whose word error rate (WER) is 40%. Several strategies are examined to mitigate the impact of ASR errors on the SMT output: (i) providing the ASR lattice, instead of the 1-best output, as input to the SMT system, (ii) training the SMT system on Spanish ASR output paired with English text, instead of Spanish reference transcripts, and (iii) improving the core ASR system. Each leads to consistent and complementary improvements in the SMT output. Compared to translating the 1-best output of an ASR system with 40% WER using an SMT system trained on Spanish reference transcripts, translating the output lattice of a better ASR system with 35% WER using an SMT system trained on ASR output improves BLEU from 32% to 38%.

Full Text