Abstract

This paper proposes an automatic algorithm for estimating the first two subglottal resonances (SGRs)—Sg1 and Sg2— from continuous speech of children, and applies it to automatic speaker normalization in mismatched, limited-data conditions. The proposed algorithm is based on the observation that Sg1 and Sg2 form phonological vowel feature boundaries, and is motivated by our recent SGR estimation algorithm for adults. The algorithm is trained and evaluated, respectively, on 25 and 9 children, aged between 7 and 18 years. The average RMS errors incurred in estimating Sg1 and Sg2 are 55 and 144 Hz, respectively. By applying the proposed algorithm to a connected digits speech recognition task, it is shown that: 1) a linear frequency warping using Sg1 or Sg2 is comparable to or better than maximum likelihood-based vocal tract length normalization (MLVTLN), 2) the performance of SGR-based frequency warping is less content dependent than that of ML-VTLN, and 3) SGRbased frequency warping can be integrated into ML-VTLN to yield a statistically-significant improvement in performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.