Abstract

The aim of this paper is to evaluate the effectiveness of a class of data-driven physical models to represent both acoustic and high-speed video data of the voice production process. Voice production analysis through numerical models of the phonation process is nowday a mature research field, and reliable dynamical glottal models of different accuracy and complexity are available. Although they are traditionally used to represent the acoustic emission during phonation, the biomechanical nature of the modeling makes them well suited to also represent high speed video recordings of the vocal folds oscillations. We discuss here a data-driven, numerically simulated model of the folds motion within an audio-video data analysis context. A model structure is proposed which is based on physical knowledge and data-driven machine learning components. A model inversion algorithm is designed that exploits acoustic data related to the glottal excitation and high speed video data of the folds, to estimate the parameters of the model and to represent the phonation characteristics. It is shown here how machine learning techniques can be effectively used in combination to biomechanical modeling, in order to fit match the osbserved data. The method is assessed on data from different subjects uttering sustained vowels.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.