Code-excited linear prediction (CELP) is the predominant methodology for communications quality speech coding below 8 kbps, and several variable-rate CELP schemes have been discussed in the literature, including QCELP, the variable-rate wideband digital cellular mobile radio speech coding standard specified in IS-95. A key component of these speech coders is the detection and classification of speech activity, and several cues for rate variation have been studied, such as measuring the short-term speech energy, deciding whether the speech is voiced or unvoiced, or making more sophisticated phonetic classifications. We present a new method for rate variation based on a measure of subband spectral flatness, called spectral entropy. Spectral entropy is a normalized indicator of the texture of the input spectrum and is thus less dependent on speech and background noise energy variations. We present some results on the use of spectral entropy for voice activity detection across subbands and then evaluate using spectral entropy for deriving mode and rate allocation cues for a variable-rate CELP coder operating at an average rate of 2 kbps. To achieve communications quality speech at this rate, we develop a new split-band vector quantization (VQ) technique for representing the line spectral pairs and a multiple codebook approach for efficiently quantizing the coefficients of a three-tap pitch predictor, called lag-indexed VQ.
Read full abstract