Abstract

Despite the superior classification ability of deep neural networks (DNN), the performance of DNN suffers when there is a mismatch between training and testing conditions. Many speaker adaptation techniques have been proposed for DNN acoustic modeling but in case of environmental robustness the progress is still limited. It is also possible to use techniques developed for adapting speakers to handle the impact of environments at the same time, or to combine both approaches. Directly adapting the large number of DNN parameters is challenging when the adaptation set is small. The learning hidden unit contributions (LHUC) technique for unsupervised speaker adaptation of DNN introduces speaker dependent parameters to the existing speaker independent network to increase the automatic speech recognition (ASR) performance of the target speaker using small amounts of adaptation data. This paper investigates the LHUC to adapt the speech recognizer to target speakers and environments where the impacts of speakers and noise differences are quantified separately. Our finding shows that the LHUC is capable of adapting to both speaker and noise conditions at the same time. Compared to the speaker independent model, about 9% to 13% relative word error rate (WER) improvement are observed for all test conditions using AMI meeting corpus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call