Abstract

Speaker adaptation transforms the standard speaker-independent acoustic models into an adapted model relevant to the user (called the target speaker) in order to provide reliable speech recognition performance. Although several conventional adaptation techniques, such as Maximum Likelihood Linear Regression (MLLR) and Maximum A Posteriori (MAP), have been successfully applied to speech recognition tasks, they demonstrate great dependency on the amount of adaptation data. However, the eigenvoice-based adaptation technique is known to provide reliable performance regardless of the amount of data, even for a very small amount. In this study, we propose an efficient eigenvoice adaptation approach to construct more reliable adapted models. The proposed approach merges eigenvoice sets for possible eigenvoice combinations, and then selects optimal eigenvoice sets that are most relevant to the target speaker. For this task, we propose an efficient unsupervised eigenvoice selection method as well as a rapid merging technique. On speech recognition experiments using the Defense Advanced Research Projects Agency׳s Resource Management corpus, the proposed approach exhibited superior performance, compared to conventional methods, in both recognition accuracy and time complexity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.