Multi-lingual unsupervised acoustic modeling using multi-task deep neural network under mismatch conditions

Yao Haitao,Xu Ji,Liu Jian

doi:10.1109/iccsn.2016.7586635

Abstract

This Cross-lingual knowledge sharing based acoustic modeling methods are usually used in Automatic Speech Recognition (ASR) of languages which do not have enough transcribed speech for acoustic model (AM) training. Conventional methods such as IPA based universal acoustic modeling have been proved to be effective under matched acoustic conditions, while usually poorly preformed when mismatch appears between the target language and the source languages. This paper proposes a method of multi-lingual unsupervised AM training for zero-resourced languages under mismatch conditions. The proposed method includes two main steps. In the first step, initial AM of the target low-resourced language was obtained using multi-task training method, in which original source language data and mapped source language data are jointly used. In the second step, AM of the target language is trained using automatically transcribed target language data, in the way of iteratively training new AMs and adapting the initial AMs. Experiments were conducted on a corpus with 100 hours untranscribed Japanese speech and 300 hours transcribed speech of other languages. The best result achieved by this paper is 51.75% character error rate (CER), which obtains 24.78% absolute reduction compared to baseline IPA system.

Full Text