Speaker Adaptation With Limited Data Using Regression-Tree-Based Spectral Peak Alignment

Shizhen Wang,Xiaodong Cui,Abeer Alwan

doi:10.1109/tasl.2007.906740

Abstract

Spectral mismatch between training and testing utterances can cause significant degradation in the performance of automatic speech recognition (ASR) systems. Speaker adaptation and speaker normalization techniques are usually applied to address this issue. One way to reduce spectral mismatch is to reshape the spectrum by aligning corresponding formant peaks. There are various levels of mismatch in formant structures. In this paper, regression-tree-based phoneme- and state-level spectral peak alignment is proposed for rapid speaker adaptation using linearization of the vocal tract length normalization (VTLN) technique. This method is investigated in a maximum-likelihood linear regression (MLLR)-like framework, taking advantage of both the efficiency of frequency warping (VTLN) and the reliability of statistical estimations (MLLR). Two different regression classes are investigated: one based on phonetic classes (using combined knowledge and data-driven techniques) and the other based on Gaussian mixture classes. Compared to MLLR, VTLN, and global peak alignment, improved performance can be obtained for both supervised and unsupervised adaptations for both medium vocabulary (the RM1 database) and connected digits recognition (the TIDIGITS database) tasks. Performance improvements are largest with limited adaptation data which is often the case for ASR applications, and these improvements are shown to be statistically significant.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speaker Adaptation With Limited Data Using Regression-Tree-Based Spectral Peak Alignment

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech and Language Processing	Publication Date: Nov 1, 2007
Citations: 43

Similar Papers

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC
Sankaran Panchapagesan ... Abeer Alwan
Computer Speech & Language | VOL. 23
Sankaran Panchapagesan, et. al.Sankaran Panchapagesan ... Abeer Alwan
02 Mar 2008
Computer Speech & Language | VOL. 23

Speaker Segmentation and Adaptation for Speech Recognition on Multiple-Speaker Audio Conference Data
Zhu Liu ... Murat Saraclar
-
Zhu Liu, et. al.Zhu Liu ... Murat Saraclar
01 Jul 2007
01 Jul 2007

On combining vocal tract length normalisation and speaker adaptation for noise robust speech recognition
Ramalingam Hariharan ... Olli Viikki
-
Ramalingam Hariharan, et. al.Ramalingam Hariharan ... Olli Viikki
05 Sep 1999
05 Sep 1999

Adaptation of children’s speech with limited data based on formant-like peak alignment
Xiaodong Cui ... Abeer Alwan
Computer Speech & Language | VOL. 20
Xiaodong Cui, et. al.Xiaodong Cui ... Abeer Alwan
12 Jul 2005
Computer Speech & Language | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speaker Adaptation With Limited Data Using Regression-Tree-Based Spectral Peak Alignment

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech and Language Processing