Abstract

A multilayer perceptron has been trained to perform an analogue mapping from the power spectra of vowels and nasal consonants, spoken by a single speaker, to the control parameters of a speech synthesiser based on an acoustic tube model. The model represents the vocal tract by ten lossless sections, whose areas are adjustable, coupled to a lossy nasal tract whose areas are fixed, except for the first area, which controls the degree of nasal coupling. The outputs of the neural network control these eleven areas, while its inputs are samples of the power spectrum which the synthesised speech spectrum is intended to copy. During training, the synthesiser is driven using exemplar sets of areas and the resulting synthetic speech provides the input spectra for the net. After training, natural speech, with this restricted phoneme set and by the same speaker, can be synthesised with good intelligibility.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.