Subband video coding with a dynamic bit allocation and geometric vector quantization

Christine I Podilchuk,Arnaud E Jacquin

doi:10.1117/12.135971

Abstract

Coding results at 384 kbps are presented based on a three—dimensional subband framework where the original image data is decomposed into spatio—temporal frequency bands. This tree—structured framework was originally introduced in [1]. The 3-Dsubband decomposition consists of two temporal subbands followed by a cascade of spatial decompositions as shown in Figure 1. The temporal filtering is based on the 2—tap Haar filterbank while the spatial filtering, both horizontal and vertical, is based on 10—tap quadrature mirror filterbanks (QMFs) ofJohnston [2]. The 11 subbands for any video sequence are displayed as given in the template of Figure 2. Figure 3 shows the frequency decomposition based on the framework given in Figure 1 for one of the image sequences, "Melanie" ,presented here. A more extensive study of the effects of different filter types on the coding results in a subband framework will be discussed in [3]. In general, for the image sequences that we have looked at, subbands 9, 10, 11 can be discarded without causing severe degradations in the original image sequences due to the low signal energy and low perceptual sensitivity in the higher frequency bands. A fixed coding rate implies a fixed number of bits for the subbands at each instant in time. The bits are adaptively allocated to the subbands based on a local energy criterion. More bits are allocated to subband 8 (low spatial—high temporal frequency band) when the motion activity is high. Bits are dynamically reallocated to subbands 2—6 (high spatial—low temporal frequency bands) when the motion activity drops below a threshold. Conditional replenishment is also implemented in all of the frequency bands in order to code static objects and background at a very low bit rate. This results in a locally adaptive frame rate coder. Conditional replenishment (CR) implemented in the 3-D framework also appears in [4, 5]. The lowest frequency subbands, labeled subbands 1—3 in Figure 1 , contain the most signal energy and require very high quality encoding in order to preserve good reconstructed picture quality. These subbands have first priority in the bit allocation and are encoded using PCM with a uniform quantizer. Once conditional replenishment allows the bit rate to drop for subbands 1—3, the higher frequency subbands can be more accurately encoded with the additional bits. The significant highpass subbands (for the case examined here, subbands 4-11 in Fig 2) are encoded using a new form of vector quantization called Geometric Vector Quantization (GVQ) which takes advantage of the sparse and highly structured characteristics of the upper frequency data. GVQ was first introduced in [6, 7]. GVQ consists of purely deterministic codebooks which require no training. The codebook entries consist of L—levels (where L and the codevector size are determined by the overall bit rate). For the results presented here, we will examine a codebook consisting of 3—levels with some constraints on the levels in order to reduce the search complexity. Section 2 discusses the bit allocation, conditional replenishment and encoding of the lowest frequency bands. Section 3 describes the generalized GVQ method and results based on 3—level quantization. Section 4 includes coding results, future avenues of research and the conclusion.

Full Text