Abstract

In this paper, we present a median-rate speech coder, the controlled adaptive prediction delta modulation coder (CAPDM), which operates at 16 kb/s with good speech quality and low algorithm complexity. The coder is dedicated to personal communication network (PCN) applications and transmits speech samples on the basis of packets. It combines the features of a one-step looking forward decision, syllabic companding, instantaneous companding, and adaptive prediction. In addition to the use of a short-term prediction filter, CAPDM also exploits the pitch property to predict speech waveform explicitly. With the aid of a pitch prediction filter, the performance of a CAPDM codec improves about 3 dB in segmental signal-to-noise ratio (SEGSNR). The average SEGSNR of CAPDM.FF is about 21 dB, which is 7 dB over traditional CVSD at 16 kb/s. We also utilize an adaptive postfilter (APF) to enhance the perceptual quality of the decoded speech. The mean opinion score (MOS) listening test of CAPDM.FF with APF shows that its average score achieves 4.19, which is as good as G.728 16-kb/s LD-CELP and is comparable with CCITT G.721 32-kb/s ADPCM. The complexity of CAPDM.FF is evaluated to be 8 MIPS, which is much lower than that of LD-CELP and could be further reduced by adopting a smaller correlation window for pitch detection. To solve the problem of packet loss, we developed a packet-based waveform substitution method by reinitializing the codec parameters at the beginning of each packet. The simulation results show that CAPDM.FF could tolerate 5% of packet loss and still keep an SEGSNR at 10 dB and an MOS at about 3.0.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call