This paper proposes an efficient method of simulated-data adaptation for robust speech recognition. The method is applied to tree-structured piecewise linear transformation (PLT). The original PLT selects an acoustic model using tree-structured HMMs and the acoustic model is adapted by input speech in an unsupervised scheme. This adaptation can degrade the acoustic model if the input speech is incorrectly transcribed during the adaptation process. Moreover, adaptation may not be effective if only the input speech is used. Our proposed method increases the size of adaptation data by adding noise portions from the input speech to a set of prerecorded clean speech, of which correct transcriptions are known. We investigate various configurations of the proposed method. Evaluations are performed with both additive and real noisy speech. The experimental results show that the proposed system reaches higher recognition rate than MLLR, HMM-based model selection and PLT.
Read full abstract