Abstract

Adaptation to speaker variations is an essential component of speech recognition systems. One common approach to adapting deep neural network (DNN) acoustic models is to perform global constrained maximum likelihood linear regression (CMLLR) at some point of the systems. Using CMLLR (or more generally, generative approaches) is advantageous especially in unsupervised adaptation scenarios with high baseline error rates. On the other hand, as the DNNs are less sensitive to the increase in the input dimensionality than GMMs, it is becoming more popular to use rich speech representations, such as log mel-filter bank channel outputs, instead of conventional low-dimensional feature vectors, such as MFCCs and PLP coefficients. This work discusses and compares three different configurations of DNN acoustic models that allow CMLLR-based speaker adaptive training (SAT) to be performed in systems with filter bank inputs. Results of unsupervised adaptation experiments conducted on three different data sets are presented, demonstrating that, by choosing an appropriate configuration, SAT with CMLLR can improve the performance of a well-trained filter bank-based speaker independent DNN system by 10.6% relative in a challenging task with a baseline error rate above 40%. It is also shown that the filter bank features are advantageous than the conventional features even when they are used with SAT models. Some other insights are also presented, including the effects of block diagonal transforms and system combination.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.