Abstract

A novel method is presented to automatically estimate voice source and formant parameters from a speech utterance based on an autoregressive with exogenous input (ARX) speech production model. The method has been tested with both synthetic and natural speech materials and compared with the well-established linear prediction (LP) method in terms of the accuracy of estimated formant frequencies. The new method always revealed better performance than the LP method. In particular, the error of the proposed method was as small as 6% for the materials of an average pitch frequency of 447 Hz, while the LP method gave an error of 21%. The result clearly indicates superiority of the proposed method over the LP method for very high-pitched voices generated by females and children. Other key features of the proposed method include: (1) very low first formant frequencies of the high vowels /i/ and /u/ can be accurately estimated; (2) voicing source amplitude and open quotient of the glottal flow pulse are both reliably estimated; and (3) the implementation of an adaptive prefilter in the analysis remarkably improves the accuracy of estimated parameter values.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.