Conversational Telephone Speech Recognition for Lithuanian

Rasa Lileikyte,Jean-Luc Gauvain,Lori Lamel

doi:10.1007/978-3-319-25789-1_16

Abstract

This paper presents a conversational telephone speech recognition system for the low-resourced Lithuanian language, developed in the context of IARPA-Babel program. Phoneme-based systems and grapheme-based systems are compared to establish whether or not it is necessary to use a phonemic lexicon. We explore the impact using Web data for language modeling and additional untranscribed data for semi-supervised training. Experimental results are reported for two conditions: Full Language Pack FLP and Very Limited Language Pack VLLP, for which respectively 40 and 3i¾źh of transcribed training data are available. Grapheme-based systems are shown to give comparable results to phoneme-based ones. Adding Web texts improves the performance of both the FLP and VLLP system. The best VLLP results are achieved using both Web texts and semi-supervised training.

Full Text