Abstract

We describe a new stochastic model for generating speech signals suitable for coding at low bit rates. In this model, the speech waveform is represented as a zero mean Gaussian process with slowly-varying power spectrum. The optimum innovation sequence is obtained by minimizing a subjective error criterion based on properties of human auditory perception. Each block of 40 samples (representing 5 ms of the speech signal sampled at 8 kHz) of the innovation signal is coded into one out of 1024 randomly generated Gaussian sequences of length 40. The chosen sequence minimizes a spectrally weighted error criterion. The innovation signal is thus encoded at 2 kbits/s. A time-varying linear filter whose parameters are determined directly from the speech signal is used to produce the desired power spectrum. Even at this low bit rate the resynthesized speech is barely distinguishable from the original.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call