A speaker‐adaptation technique for context‐dependent models represented by hidden markov networks

Jun-Ichi Takami,Shigeki Sagayama

doi:10.1002/scj.4690270207

Abstract

AbstractThis study aims at the realization of a speaker‐independent speech recognition system based on the speaker adaptation with a supervisor. This paper describes the highly accurate speaker‐adaptation technique using a small number of training samples. When a small number of speech samples are used for adaptation there arise problems that sufficient information cannot be obtained to update simultaneously a large number of model parameters, and an estimation error is included to the statistical bias of the samples.From such a viewpoint, this paper proposes a speaker‐adaptation technique using the hidden Markov network (HMnet), which employs a smaller number of model parameters than the mixed continuous‐distributed phoneme HMM, which is independent of the phoneme context, while realizing an equal or better recognition performance. As the adaptation technique, the moving vector field smoothing (VFS) method is used. This method can realize simultaneously the interpolation for the unadapted model parameters to cope with the small number of samples and the correction of the estimation in the speaker adaptation. The standard speaker pre‐selection method also is investigated in order to improve the accuracy of the speaker adaptation.

Full Text