Articulatory speech synthesizers model the human vocal tract by means of its geometrical and functional properties. It is believed that this approach can be advantageous in speech coding at bit rates below 4800 b/s. Existing articulatory synthesizers work in the time domain. They either solve a system of differential equations for the vocal tract and the glottis, or synthesize speech using wave digital filters. The first approach is computationally very cumbersome. Both approaches have difficulties in incorporating important acoustic parameters, for example, the radiation impedance at the lips, wall vibration, and other losses. So far a realistic glottis model suitable for the wave digital filter approach does not exist. We have combined a nonlinear time‐domain model of the vocal cords [K. Ishizaka and J. L. Flanagan, Bell Syst. Tech. J. 51, 1233–1268 (1972)] with a linear frequency‐domain chain‐matrix model of the vocal tract [e.g., M. M. Sondhi, paper ♯4.5.1, Proc. Int. Congress on Acoustics, Paris, France, 1983, Vol. 4, pp. 167–170]. The interface between these two models consists of convolving the glottal flow in the time domain with impulse responses of the tract obtained by inverse FFT. Examples of synthesized speech using manually generated and measured tract areas will be given.
Read full abstract