Speech Recognition System Combination for Machine Translation

M.J.F Gales,A Messaoudi,L Lamel,T Ng,R Sinha,X Liu,K Yu,K Nguyen,P.C Woodland,L Nguyen,S Matsoukas,J-L Gauvain

doi:10.1109/icassp.2007.367310

Abstract

The majority of state-of-the-art speech recognition systems make use of system combination. The combination approaches adopted have traditionally been tuned to minimising word error rates (WERs). In recent years there has been a growing interest in taking the output from speech recognition systems in one language and translating it into another. This paper investigates the use of cross-site combination approaches in terms of both WER and impact on translation performance. In addition, the stages involved in modifying the output from a speech-to-text (STT) system to be suitable for translation are described. Two source languages, Mandarin and Arabic, are recognised and then translated using a phrase-based statistical machine translation system into English. Performance of individual systems and cross-site combination using cross-adaptation and ROVER are given. Results show that the best STT combination scheme in terms of WER is not necessarily the most appropriate when translating speech.

Full Text