The tube resonance model speech synthesizer

Leonard C Manzara

doi:10.1121/1.4788442

Abstract

The Tube Resonance Model (TRM) synthesizer is an articulatory speech synthesizer implemented in software. It directly emulates the resonant behavior of the oropharyngeal and nasal tracts using digital waveguides. The oropharyngeal cavity is subdivided into 8 regions of unequal length, where particular regions correspond to the human articulators of tongue, teeth, and mouth. The radius (cross-sectional area) of each region can be varied independently over time. The differences in radii between regions gives rise to differences in acoustic impedance, which is modeled using two-way scattering junctions. The nasal cavity is composed of 5 equal-length sections, and is connected to the vocal tract via another section (the velum) using a three-way scattering junction. The total length of the tube can be varied over a continuous range, allowing one to synthesize male, female, and juvenile voices.

Full Text