Abstract

In realizing a speech recognition system robust to variation of speakers, a reliable adaptation algorithm is needed. Most adaptation techniques require a large amount of adaptation data from the target speaker to carry out the adaptation task. With the time needed to gather and transcribe adaptation utterances together with the time to execute adaptation, application to speech recognition is limited. We propose a rapid approach to speaker adaptation. We employ HMM-Sufficient Statistics in storing speaker-dependent subspaces. N-Closest speaker selection is employed in resolving the combinatorics of the speaker-dependent subspaces during recognition. This approach allows the adapted model to have a direct correspondence with the target speaker by using the target speakers’ utterance for the N-Closest speaker selection. The proposed method employs series of adaptation processes. First, the general model is trained, then adapted to broad gender/age classes, which are further adapted to speaker-specific data. Since HMM-Sufficient Statistics are pre-computed offline, little computation is needed in carrying out the adaptation task online. Moreover, the method requires only a single arbitrary utterance from the target speaker for adaptation. In this paper, we discuss the modification, expansion, and the improvement of rapid adaptation based on HMM-Sufficient Statistics in the framework of Baum-Welch and maximum likelihood linear regression (MLLR). Experimental results using the conventional MLLR, speaker-adaptive training, and CMLLR are evaluated and compared. We also tested for robustness in office, car, crowd and booth environments in several SNR conditions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.