Abstract

Deep neural networks (DNNs) have shown a great promise in exploiting out-of-language data, particularly for under-resourced languages. The common trend is to merge data from various source languages to train a multilingual DNN and then reuse the hidden layers as language-independent feature extractors for a low-resource target language. While there is a consensus that using as much data from various languages results in a better and more general multilingual DNN, employing only source languages similar to the target language has proven effective. In this study, we propose a novel framework for multilingual DNN training, which employs all the available training data and exploits complementary information from individual source languages at the same time. Toward this goal, we borrow the idea of an ensemble with one generalist and many specialists . The generalist is derived from a multilingual DNN acoustic model trained on all available multilingual data; the specialists are the DNNs derived from the source languages individually. Then, the constituents in the ensemble are combined using weighted averaging schemes, where the combination weights are trained to minimize the cross-entropy objective function. In this framework, we seek for complementary information among the constituents while it is possible to get at least the performance equal to the baseline. Moreover, unlike previous well-known system combination schemes, only one model is required during decoding. We successfully examined two combination methodologies and demonstrated their usefulness in different scenarios using the multilingual GlobalPhone dataset. It is observed that, specifically, speech recognition systems developed in low-resource settings profit from the proposed strategy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call