Robust waveform interpolation for the entire speech signal

W. Bastiaan Kleijn,Jesper Haagen

doi:10.1121/1.409726

Abstract

In waveform interpolation (WI), a speech signal is reconstructed by the concatenation of infinitesimal segments of an evolving characteristic waveform, which is obtained by interpolation over time [W. B. Kleijn, IEEE Trans. Speech Audio Process. 1, 386–399 (1993)]. WI leads to efficient coding of voiced speech, but current implementations switch to CELP for nonperiodic signals. The WI paradigm is extended to provide an effective basis for the coding of voiced and unvoiced speech and background noise. Prototype waveforms are extracted every 2.5 ms. At this high rate the WI analysis–synthesis system results in transparent speech quality. Each prototype waveform is decomposed into a slowly evolving waveform (SEW), obtained by convolution in time with a 40-ms smoothing window, and a remainder, the rapidly evolving waveform (REW). Because of its low bandwidth, a low bit rate suffices for the SEW (additional processing lowers the bit rate further), while the REW requires only a rough statistical description (e.g., its phase spectrum can be randomized). The new paradigm facilitates efficient, robust speech coding in the range 2–8 kb/s.

Full Text