A comparison of novel techniques for rapid speaker adaptation

Timothy J Hazen

doi:10.1016/s0167-6393(99)00059-x

Abstract

This paper introduces two novel techniques for rapid speaker adaptation, reference speaker weighting and consistency modeling. Also presented is an adaptation technique called speaker cluster weighting (SCW) which provides a means for improving upon generic hierarchical speaker clustering techniques. Each of these adaptation methods attempts to utilize the underlying within-speaker correlations that are present between the acoustic realizations of different phones. By accounting for these correlations, a limited amount of adaptation data can be used to adapt the models of every phonetic acoustic model, including those for phones which have not been observed in the adaptation data. Results were obtained using the DARPA Resource Management corpus for a set of rapid adaptation experiments where single test utterances were used for adaptation and recognition simultaneously. Using the new adaptation techniques relative word error rate reductions ranging from 4.9% to 8.4% were obtained under various conditions. Using a combination of hierarchical speaker clustering techniques and the novel adaptation techniques, a word error rate reduction of 20% has been achieved from the baseline speaker independent (SI) recognition system.

Full Text