Multilingual MLP features for low-resource LVCSR systems

Samuel Thomas,Hynek Hermansky,Sriram Ganapathy

doi:10.1109/icassp.2012.6288862

Multilingual MLP features for low-resource LVCSR systems

Samuel Thomas, Hynek Hermansky + Show 1 more

https://doi.org/10.1109/icassp.2012.6288862

Copy DOI

Publication Date: Mar 1, 2012

Citations: 107

Affiliation: Johns Hopkins University

#Hour Of Training Data #Large Vocabulary Continuous Speech Recognition + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We introduce a new approach to training multilayer perceptrons (MLPs) for large vocabulary continuous speech recognition (LVCSR) in new languages which have only few hours of annotated in-domain training data (for example, 1 hour of data). In our approach, large amounts of annotated out-of-domain data from multiple languages are used to train multilingual MLP systems without dealing with the different phoneme sets for these languages. Features extracted from these MLP systems are used to train LVCSR systems in the low-resource language similar to the Tandem approach. In our experiments, the proposed features provide a relative improvement of about 30% in an low-resource LVCSR setting with only one hour of training data.

Full Text