Abstract

We propose a discriminative fuzzy clustering maximum a posterior linear regression (DFCMAPLR) model adaptation approach to compensate the acoustic mismatch due to speaker variability. The DFCMAPLR approach adopts the MAP criterion and a discriminative objective function to estimate shared affine transform and fuzzy weight sets, respectively. Then, through a linear combination of the calculated fuzzy weights and shared affine transforms, more specific affine transforms are formed for model adaptation. By incorporating the MAP criterion and the discriminative information, DFCMAPLR can calculate shared affine transforms reliably and enhance the discriminative power of the adapted acoustic model. Based on the experimental results on the ASTTEL200 Mandarin corpus, we verified that DFCMAPLR outperforms not only the conventional maximum likelihood linear regression (MLLR) but also the fuzzy clustering MLLR(FCMLLR), which estimates the shared affine transform and fuzzy weight sets both based on the maximum likelihood criterion. Moreover, when compared to the baseline result, DFCMAPLR provides a clear improvement of 9.86% (24.04% to 21.67%) relative average phone error rate (PER) reduction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.