Abstract

An important goal in current speech coding research is providing high-quality speech at low bit rates (4.8–16 Kbps). Several methods [1]–[3] have been proposed recently to achieve this end. Compared to the conventional linear predictive (LP) vocoder [4], these methods employ an enhanced speech production model to synthesize speech. For example, instead of a single stage, the modulation filter now typically consists of two stages: i) a short-delay filter modeling the spectral envelope of speech, and ii) a long-delay filter modeling the spectral fine structure. Both are time-varying, all-pole filters and are derived from the original speech through LP analysis. Also, some information is provided about the excitation signal, which is selected by means of an analysis-by-synthesis procedure whereby a perceptually weighted error criterion is minimized In the multi-pulse linear predictive coder (MPLPC) [1], the excitation signal is a sequence of appropriately located and scaled impulses. In the code excited linear predictive coder (CELPC) [2], it is an entry from a codebook of white, gaussian noise sequences. In the self excited vocoder (SEV) [3], it is selected from the past history of the source excitation. As a result of these improvements, the above coders are able to synthesize high-quality speech at low bit rates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.