Abstract

In previous work [1], we proposed a speaker adaptation technique based on the second subglottal resonance (Sg2), which showed good performance relative to vocal tract length normalization (VTLN). In this paper, we propose a more reliable algorithm for automatically estimating Sg2 from speech signals. The algorithm is calibrated on children’s speech data collected simultaneously with accelerometer recordings from which Sg2 frequencies can be directly measured. To investigate whether Sg2 frequencies are independent of speech content and language, we perform a cross-language study with bilingual Spanish-English children. The study verifies that Sg2 is approximately constant for a given speaker and thus can be a good candidate for limited data speaker normalization and cross-language adaptation. We then present a cross-language speaker normalization method based on Sg2, which is computationally more efficient than maximum-likelihood based VTLN, and performs more robustly than VTLN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.