Abstract

A new approach to speech parameter normalization is presented in which no prior knowledge about the input speakers is required. The vocal-tract length and area function are first estimated from the acoustic speech waveform, and then the area function is normalized to an acoustic tube of the same shape having a certain reference length. The normalized formant frequencies are defined as the resonance frequencies of this acoustic tube. The distributions of unnormalized and normalized formant frequencies for 9 stationary American vowels were investigated with 14 male and 12 female speakers. Fairly compact distributions of the vowels in the normalized F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> -F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> -F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</inf> space were obtained. A preliminary identification test for stationary vowels based on this normalization method showed an expected average recognition rate of 84-96 percent for arbitrarily selected speakers, depending on the phonetic criteria adopted for defining "correct" identification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.