Abstract

A formant tracking algorithm has been developed to provide control signals automatically for a parallel formant speech synthesizer. In voiced sounds, the algorithm first estimates positions of high-energy excitation in the sampled speech waveform to be analyzed. These positions generally correspond to the points of glottal closure, and the interval following each position should correspond to a closed glottis region. For each such position the speech samples in the following few milliseconds are used to calculate a log power spectrum. An analysis-by-synthesis procedure iteratively updates estimates of formant frequencies and amplitudes using spectral matching criteria. A similar spectral-matching procedure is used during unvoiced sounds, but time averaging is introduced as appropriate when deriving the spectra to be matched. To cope with situations when the formant structure is not clearly defined in the power spectrum the iteration is performed separately for several different allocations of formants to spectral peaks. A final choice of formant frequencies and amplitudes is made every 10 msec and depends on both goodness of spectral match and continuity of formant tracks. Demonstrations will be made of the speech quality achieved for a number of male talkers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call