In this paper, we describe a high quality low complexity scalable audio coding scheme, using an optimum wavelet packet (WP) basis signal representation based on the time varying characteristics of the audio signal. In ISO/MPEG audio coding standards [1–3], resolution of decomposition filterbank (uniform) does not match with the resolution of psychoacoustic model (which requires more resolution and needs to be matched with the critical bands (non uniform) of cochlea). Hence MPEG coder uses a separate high resolution decomposition filterbank for psychoacoustic model implementation, which increases the computational load of the coder. Here, we use a wavelet packet decomposition structure closely matching to the critical bands [4,5] of human auditory system, to transform the data into wavelet domain and then these wavelet packet coefficients are used to drive the psychoacoustic model directly. Hence, psychoacoustic model design is integrated with the design of decomposition filterbank. Other features of the proposed coder are scalability (can support three standard industrial sampling frequencies 11.025 kHz, 22.050 kHz and 44.1 kHz) and optimum wavelet basis selection from a predefined library of wavelet bases, by extracting seven statistical features of the audio signal to be encoded. A new Vector Quantization (VQ) scheme is also proposed here, in which the length of the code book can be varied in accordance with the psychoacoustic model requirement. Experimental results show that the proposed coder yields almost transparent quality with compression ratios in the range of e to 10.
Read full abstract