On 450-600 b/s natural sounding speech coding

Y.M Cheng,D O'Shaughnessy

doi:10.1109/89.222879

Abstract

Algorithms for encoding speech with good intelligence and naturalness at very low rates are studied. Naturalness is retained by encoding accurately the speech excitation information from an LPC (linear predictive coding) model. A glottal ARX (autoregressive with exogenous input) technique is used to model the speech signal for high quality. A large reduction in coding rate is achieved through short-term temporal compression of the speech and vector quantization. Application of traditional vector quantization to the temporal decomposition output is discussed, with consideration of distortion measures and codebook generation. Based on properties of short-term temporal decomposition, finite-state vector quantization is introduced to further decrease the coding rate. A problem associated with this technique, estimation of a state transition matrix with incomplete data, is treated. The general result is that practical coders operating in a range of 450-600 b/s with a delay of about 200 ms and natural-sounding output speech can be designed.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>

Full Text