Abstract

As speech-coding standards have improved over the years, so complexity has increased, and less emphasis been placed on low encoding/decoding delay. We present a low-complexity, low-delay speech codec based on tree-coding with sample-by-sample adaptive long- and short-code generators that incorporates pre- and post-filtering for perceptual weighting and multimode speech classification with comfort noise generation (CNG). The pre-/post-weighting filters adapt based on the code generator parameters available at both the encoder and decoder rather than the usual method that uses the input speech. The coding of the multiple speech modes and comfort noise generation is accomplished using the code generator adaptation algorithms, again, rather than using the input speech. Codec complexity comparisons are presented and operational rate distortion curves for several standardized speech codecs and the new codec are given. Finally, codec performance is shown in relation to theoretical rate distortion bounds.

Highlights

  • Speech-coding has a history of more than 50 years, but the current research directions involving linear prediction can be traced back to the mid to late 1960s [1,2]

  • Voice Activity Detection (VAD)/comfort noise generation (CNG) is used with G.728, 0.5 bit/sample is saved for “lathe” as seen in Figure 15, but from Figure 16, there is no reduction in rate when using VAD/CNG with G.728 because

  • The performance evaluations reveal that the design decisions of moving the perceptual weighting outside of the analysis-by-synthesis loop can, perform well, and result in a substantial reduction in algorithmic complexity

Read more

Summary

Introduction

Speech-coding has a history of more than 50 years, but the current research directions involving linear prediction can be traced back to the mid to late 1960s [1,2]. High complexity is an obvious challenge in many applications, with respect to battery power for mobile devices, and increased latency can impact conversational voice quality, and perhaps more subtly, cellular capacity. With these ideas in mind, we have conducted research on speech codecs that have greatly reduced complexity and latency, while attempting to strike a performance balance between coded speech quality and required bitrate. The input speech is classified into modes, which are coded separately Since this coder uses multiple modes, perceptual pre- and post-weighting, tree searching, and a pitch predictor, it is denoted as the Multimode Tree Coder with Pre- and

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call