Abstract
Recently, we have proposed a novel fast adaptation method for the hybrid DNN-HMM models in speech recognition [1]. This method relies on learning an adaptation NN that is capable of transforming input speech features for a certain speaker into a more speaker independent space given a suitable speaker code. Speaker codes are learned for each speaker during adaptation. The whole multi-speaker training dataset is used to learn the adaptation NN weights. Our previous work has shown that this method is quite effective in adapting DNNs even when only a very small amount of adaptation data is available. However, the proposed method does not work well in the case of convolutional neural network (CNN). In this paper, we investigate the fast adaptation of CNN models. We first modify the speaker code based adaptation method to better suit to the CNN structure. Moreover, we investigate a new adaptation scheme using speaker specific adaptive nodes output weights. These weights scale different nodes outputs to optimize the model for new speakers. Experimental results on the TIMIT dataset demonstrates that both methods are quite effective in terms of adapting CNN based acoustic models and we can achieve even better performance by combining these two methods together.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.