Abstract

A method is proposed to estimate individual vocal-tract parameters from formant frequency patterns. Vocal-tract parameters, such as scaling factors of vocal-tract length and area were determined in reference to the given area function of an articulatory synthesizer. These individual parameters of various speakers are expected to be suitable to normalize speakers in automatic speech recognition. It has been shown by Schroeder (1967) that there is a linear relationship between relative formant frequency deviation δω/ω and a relative area perturbation δA/A of the area function. Further, a similar relationship can be derived for length perturbation of an area function. The presented method is based on the inverse relationship: deviations of the formant frequency pattern of a speaker relative to the reference speaker are attributed to length and area perturbations of the area function of the reference speaker. A pilot study using synthetic speech showed promising results: estimated vocal-tract area functions were in good agreement with natural area functions. In the present study the method is extended to real speech. Investigations are carried out on speech samples of a German data base (PhonDat) to find speaker-specific parameters. [Work supported by Deutsche Forschungsgemeinschaft (Str 255/7-2).]

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.