Abstract
Binaural cue coding (BCC) is an efficient technique for spatial audio rendering by using the side information such as interchannel level difference (ICLD), interchannel time difference (ICTD), and interchannel correlation (ICC). Of the side information, the ICTD plays an important role to the auditory spatial image. However, inaccurate estimation of the ICTD may lead to the audio quality degradation. In this paper, we develop a novel ICTD estimation algorithm based on the nonuniform discrete Fourier transform (NDFT) and integrate it with the BCC approach to improve the decoded auditory image. Furthermore, a new subjective assessment method is proposed for the evaluation of auditory image widths of decoded signals. The test results demonstrate that the NDFT-based scheme can achieve much wider and more externalized auditory image than the existing BCC scheme based on the discrete Fourier transform (DFT). It is found that the present technique, regardless of the image width, does not deteriorate the sound quality at the decoder compared to the traditional scheme without ICTD estimation.
Highlights
Since 1990, joint stereo coding algorithm has been widely used in the two-channel audio coding
BCC is based on the spatial hearing theory [6], which uses the binaural cues such as interaural level difference (ILD), interaural time difference (ITD), and interaural coherence (IC) for rendering spatial audio
As input multichannel audio signals are downmixed into mono sum signal, side information which comprises some interchannel cues is analyzed and obtained, and both sum signal and side information are transmitted to the decoder
Summary
Since 1990, joint stereo coding algorithm has been widely used in the two-channel audio coding. BCC exploits binaural cue parameters for capturing the spatial image of multichannel audio and enables low-bitrate transmission by transmitting mono signals plus side information in relation to binaural perception. For BCC scheme applied to loudspeaker playback or amplitude panning signals, the use of time difference cue hardly plays an important role in widening and externalizing the auditory image. At frequencies below about 1–1.5 kHz, the ICTD is an important binaural cue for headphone playback [7] Generic BCC scheme estimates ICTD in frequency subbands partitioned according to psychoacoustic critical bands [9]. The DFT method may not analyze subband properties properly so that the BCC scheme with the ICTD estimation is unable to improve the audio quality and even deteriorates it.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have