Estimation of vocal tract filter parameters using a neural net

M.G Rahim,C.C Goodyear

doi:10.1016/0167-6393(90)90045-b

Abstract

A multilayer perceptron has been trained to perform an analogue mapping from the power spectra of vowels and nasal consonants, spoken by a single speaker, to the control parameters of a speech synthesiser based on an acoustic tube model. The model represents the vocal tract by ten lossless sections, whose areas are adjustable, coupled to a lossy nasal tract whose areas are fixed, except for the first area, which controls the degree of nasal coupling. The outputs of the neural network control these eleven areas, while its inputs are samples of the power spectrum which the synthesised speech spectrum is intended to copy. During training, the synthesiser is driven using exemplar sets of areas and the resulting synthetic speech provides the input spectra for the net. After training, natural speech, with this restricted phoneme set and by the same speaker, can be synthesised with good intelligibility.

Full Text