Abstract

Speech coding at low rates usually requires the coding of excitation information such as fundamental frequency (F0). Typically, F0 is calculated every 10–25 ms and represented using 5–6 bits. Thus 200–600 bits/s are dedicated to excitation. Since F0 does not change radically between consecutive voiced frames, a differential coding scheme can partially reduce the F0 rate without perceptual degradation in the synthetic speech. However, even lower rates can be achieved by approximating the F0 patterns as linear rises and falls. Each such change is coded as a pair of F0 values and a duration. Specifically, each peak or valley in a smoothed F0 contour is coded, as well as the elapsed time since the previous point. Typical English utterances (not including pauses) appear to average seven peaks and valleys per second. If the durations are coded using 4 bits/sample, F0 contours would occupy less than 70 bits/s with little degradation in speech quality. Such a reduction in F0 storage rate could be useful in very low rate vocoders.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call