Abstract

Reproduced speech extracted from actual voice recordings of a few selected individuals representing a wide variance of age, speech dialect (English), and sex was used as input values to a four‐layer (input, hidden, output, and state) generalized backward propagation artificial neural network system (ANS) using a linear prediction coding (LPC) model to produce more natural sounding artificial speech. The speech phonetic research environment (SPIRE) MIT‐developed software package running on a Symbolics computer was utilized in making the original voice recordings. The linear predictive analysis of the LPC model efficiently represented the speech signals in terms of slowly varying parameters. It converted the combined spectral contributions of the glottal flow within a pitch period, the vocal tract, and the radiation of the lips into a single recursive (all‐pole) time‐varying filter. The transfer function of the filter involved the use of gain coefficients that, when fed into the system, were found to destabilize it. The LPC model was modified to use impulse characteristics of the system rather than the individual discrete coefficients. Preliminary data indicated close output to input matches, artificial speech to natural speech, indicating the ability of the artificial neural network to learn the timing, pitch fluctuations, and connectivity between individual sounds, and the speaking habits unique to a person.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call