Efficient multi-lingual unsupervised acoustic model training under mismatch conditions

Masahiro Saiko,Ryosuke Isotani,Hitoshi Yamamoto,Chiori Hori

doi:10.1109/slt.2014.7078544

Abstract

We propose a new multi-lingual unsupervised acoustic model (AM) training method for low-resourced languages under mismatch conditions. In those languages, there is very limited or no transcribed speech. Thus, unsupervised acoustic modeling using AMs of different languages (not low-resourced languages) has been proposed. The conventional method has shown to be effective for similar acoustic conditions, such as speaking-style, between a low-resourced language and different languages. However, since it is not easy to prepare the matched AMs of different languages, mismatch problem between each AM and the speech of a low-resourced language for unsupervised acoustic modeling is practically occurred. In this paper, we deal with this mismatch problem. To generate more accurate automatic transcriptions under mismatch conditions, we introduce two things: (1) Initial AMs were trained with speech of different languages that was mapped to the phonemes of a low-resourced language and (2) Iterative process to switch back and forth between training of AMs and adaptation of the initial AMs. The proposed method without any transcriptions achieved a word error rate of 32.1% on the evaluation set of IWSLT2011, while the word error rates of the conventional method and the supervised training method were 39.3 and 22.7%, respectively.

Full Text