RWTH LVCSR systems for quaero and EU-bridge: German, Polish, Spanish and Portuguese

M Ali Basha Shaik,Markus Nußbaum-Thom,Zoltán Tüske,Ralf Schlüter,M Ali Tahir,Hermann Ney

doi:10.21437/interspeech.2014-257

Abstract

In this paper, German, Polish, Spanish, and Portuguese large vocabulary continuous speech recognition (LVCSR) systems developed by the RWTH Aachen University are presented. All the above mentioned systems for the aforementioned languages are used for the Quaero and EU-Bridge project evaluations. The LVCSR systems developed for these competitive evaluations focus on various domains like broadcast news, podcasts and lecture domain. Transcription of the speech for these tasks is challenging due to huge variability in the acoustic conditions and a significant portion of audio data includes spontaneous speech. Good improvements are obtained using stateof-the-art multilingual bottleneck features, minimum phone error trained acoustic models, language model (LM) adaptation and confusion-network based system combination. In addition, an open vocabulary approach using morphemic units is investigated along with the LM adaptation for the German LVCSR. Index Terms: LVCSR, European, Quaero, EU-Bridge

Full Text