Abstract

Frequency-domain simulations of the human vocal tract (VT) have previously shown the importance of including the piriform fossae, which impart a pole and two zeros in the 4–5-kHz frequency range and thereby contribute to speaker individualities. The literature has also shown that time-domain simulation of VT acoustics can result in high-quality synthesis naturally including interactions between the time-varying glottal area and the supraglottal VT. In the present work, the time-domain model of [S.Maeda, Speech Commun. 1, 199–229 (1982)] was extended to include both left and right piriform fossae as side-branches connected to the main VT, in addition to the nasal tract and sinuses. Departing from Maeda’s original implementation owing to the complexity of including more than one side branch, the variables representing acoustic pressure and volume velocity at the piriform fossae and nasal tract junctions were analytically eliminated, and the resulting large system of linear equations were solved simultaneously at each simulation sample. This direct method runs at only a few times real-time on a 1.8-GHz notebook PC, while achieving a more natural sound quality in speech synthesis and control over timbral (or voice quality) features that contribute to each speaker’s individuality. [Work supported by NiCT and SCOPE-R of Japan.]

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.