Abstract

It has been shown that speech of high quality can be synthesized using a sinusoidal model when the amplitudes, frequencies, and phases are derived from a high-resolution analysis of the short-time Fourier transform (STFT). It has also been shown that if the measured sine-wave frequencies are replaced by a harmonic set of frequencies in which the fundamental frequency is chosen to make the harmonic model a ‘‘best fit’’ to the measured sine-wave data, then synthetic speech of high quality can also be obtained provided the amplitudes and phases are obtained by sampling the STFT at the harmonic frequencies. A model has also been developed for the sine-wave phases that has a linear component corresponding to the onset time of the glottal pulse, a minimum phase component due to the dispersive characteristics of the vocal tract, and a random component that represents the degree to which the speech segment was unvoiced. While conventional methods are used for coding the pitch and voicing, the sine-waves amplitudes are coded using high-order allpole models. Scalar quantization of the line spectral frequencies offers good performance at rates from 4800–8000 bps, while a multiband vector quantizer results in performance that is quite good at 2400 bps.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.