Abstract

While physically based voice production models have potential applications in clinical intervention of voice disorders and personalized natural speech synthesis, their current use is limited due to the high computational cost associated with simulating the voice production process. In our previous studies [Zhang 2015, J. Acoust. Soc. Am. 137, 898], we have developed a reduced-order voice synthesis program with significantly improved computational efficiency toward real-time applications. One of the simplifications is the use of vocal fold eigenmodes as building blocks to reconstruct more complex vocal fold vibration patterns, which has significantly reduced the computational time, particularly if only a few eigenmodes are used in the simulations. The goal of this study is to identify the minimum number of eigenmodes that need to be included in order to achieve a balance between computational speed and fidelity in voice acoustics and voice quality. The results show that for most voice conditions as few as 30 eigenmodes are sufficient to accurately predict the fundamental frequency, vocal intensity, and selected spectral measures. It is expected that for applications in which absolute values are not as essential, even smaller number of eigenmodes would be acceptable, allowing near real time capability. [Work supported by NIH.]

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.