The performance of a variable rate code excited linear predictor system is investigated. The coding system is based on a finite state CELP (FSCELP) frame work. Each individual state is primarily identified with a LPC model order, LPC coefficients bit allocation, excitation code book population density and state encoding rate. Successive input speech vectors are encoded at a rate that depends on the current state of the FSCELP system and the input vector characteristics. The use of a finite state system involves implicit clustering of speech signals. The lower rate states are selected during highly correlated steady state speech segments when relatively few bits are required to obtain adequate fidelity. For speech signals with a strong glottal excitation, unvoiced signals and transient speech segments, a relatively greater quantisation accuracy is needed to obtain good fidelity and therefore higher rate states of the system are used. Further improvement is obtained by using gamma populated excitation codebooks, for those states that are mainly used to encode speech signals with a strong underlying glottal excitation pulses. Experiments focus on investigation of the varying encoding requirements of the excitation signal for low pass, voiced, unvoiced and transient speech signals. The parameters of the finite state CELP system are designed to match the encoding requirements of typical speech signals. The greater part of the coding gain is obtained from variable rate encoding of the excitation signal. Using a six-state FSCELP, good quality speech is obtained at an average, maximum and minimum bit rates of 4 kbit/s, 10 kbit/s and 2 kbit/s, respectively.
Read full abstract