Abstract

This paper proposes audio coding using an efficient long-term prediction method to enhance the perceptual quality of audio codecs to speech input signals at low bit-rates. The MPEG-4 AAC-LTP exploited a similar concept, but its improvement was not significant because of small prediction gain due to long prediction lags and aliased components caused by the transformation with a time-domain aliasing cancelation (TDAC) technique. The proposed algorithm increases the prediction gain by employing a deharmonizing predictor and a long-term compensation filter. The look-back memory elements are first constructed by applying the de-harmonizing predictor to the input signal, then the prediction residual is encoded and decoded by transform audio coding. Finally, the long-term compensation filter is applied to the updated look-back memory of the decoded prediction residual to obtain synthesized signals. Experimental results show that the proposed algorithm has much lower spectral distortion and higher perceptual quality than conventional approaches especially for harmonic signals, such as voiced speech.

Highlights

  • The main objective of speech and audio coding algorithms is to represent an input signal with as few bits as possible while maintaining high perceptual quality; their fundamental design concepts are somewhat different

  • Speech coding that employs the voice production mechanism is used for bidirectional communications, while audio coding that utilizes the hearing mechanism is used for one-way broadcasting services in general

  • In response to the Call for Proposal (CfP) on the unified speech and audio coding (USAC), 8 candidate systems have been submitted, and a reference model was selected through a competitive evaluation process [5, 6]

Read more

Summary

Introduction

The main objective of speech and audio coding algorithms is to represent an input signal with as few bits as possible while maintaining high perceptual quality; their fundamental design concepts are somewhat different. Relatively long transform analysis leads to roughness, because the pitch is rather frequently varied in the transform duration, and the harmonic components in the frequency domain are not ensured to be preserved by perceptual bit allocation As an another aspect, it should be noted that each peak and valley coming from the pitch harmonics might be independently coded in the transform domain, it is less efficient to code them as much as to be done by namely long-term prediction in many speech coders [12]. This paper proposes a new long-term prediction structure that can be integrated into transform-based audio coding algorithms. The harmonic components of input signal are first reduced by a deharmonizing long-term predictor, and the predicted signal is encoded and decoded by a transform coder. Simulation results obtained from objective and subjective tests confirm the superiority of the proposed algorithm especially for speech and concatenated signals

Limitation oF AAC-LTP
Proposed Algorithm
Implementation
20 Speech
Performance Evaluation
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call