Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation

Sheng Li,Tatsuya Kawahara,Xugang Lu,Yuya Akita

doi:10.21437/interspeech.2015-608

Abstract

In this paper, we introduce an ensemble speaker modeling using a speaker adaptive training (SAT) deep neural network (SAT-DNN). We first train a speaker-independent DNN (SIDNN) acoustic model as a universal speaker model (USM). Based on the USM, a SAT-DNN is used to obtain a set of speaker-dependent models by assuming that all other layers except one speaker-dependent (SD) layer are shared among speakers. The speaker ensemble matrix is created by concatenating all of the SD neural weight matrices. With matrix factorization technique, an ensemble speaker subspace is extracted. When testing, an initial model for each target speaker is selected in this ensemble speaker subspace. Then, adaptation is carried out to obtain the final acoustic model for testing. In order to reduce the number of adaptation parameters, low-rank speaker subspace is further explored. We test our algorithm on lecture transcription task. Experimental results showed that our proposed method is effective for unsupervised speaker adaptation. Index Terms: speaker adaptation, deep neural networks, ensemble modeling, lecture transcription

Full Text