Abstract

An efficient representation of the LPC excitation is essential in predictive coding systems for synthesizing high-quality speech at low bit rates. In this paper, a method is presented that takes advantage of the nonuniform spacing of auditory critical bands to achieve an efficient frequency-domain representation of LPC excitation. A segment of LPC excitation with a duration of N samples, represented as a Fourier series, requires N/2 sinusoidal components uniformly spaced along the frequency axis for exact reproduction of the excitation. Thus, for speech bandlimited to 4 kHz, a 10-ms segment requires 40 frequency components for exact reproduction. It was found that, by using uniform frequency spacing below 1 kHz and logarithmic spacing above 1 kHz, the number of sinusoidal components can be reduced to 15 without introducing any audible distortion in the synthetic speech signal. Subjective tests were conducted to determine the effective signal-to-noise ratio of synthetic speech for different numbers of sinusoidal components. These results and a tape demonstration of synthetic speech will be presented at the meeting.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.