Abstract

LPC speech synthesis uses two separate excitation signals—a delta-function pulse once every pitch period for voiced speech and white noise for unvoiced speech. This way of representing excitation requires that speech segments be classified accurately into voiced and unvoiced categories and the pitch period of voiced segments be known. It is now well recognized that such a rigid idealization of the excitation is often responsible for the unnatural quality associated with synthesized speech. We find that a more flexible representation of the excitation is necessary for producing natural-sounding speech. This paper presents an analysis-by-synthesis procedure for determining the optimal excitation for LPC synthesis (at different bit rates) without requiring prior knowledge of either the voiced-unvoiced classification or the pitch period. The excitation is found by minimizing the perceptual difference between waveforms of the original and the synthetic speech signals using a noniterative procedure. The perceptual difference metric takes account of the finite frequency resolution and the masking properties of the human hearing mechanism.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call