Digital video's increased popularity has been driven to a large extent by a flurry of international standards (MPEG-1, MPEG-2, H.263, etc). In most standards, the rate control scheme, which plays an important role in improving and stabilizing the decoding and playback quality, is not defined, and thus different strategies can be implemented in each encoder design. Several rate-distortion (R-D)-based techniques have been proposed aimed at the best possible quality for a given channel rate and buffer size. These approaches are complex because they require the R-D characteristics of the input data to be measured before making quantization assignment decisions. We show how the complexity of computing the R-D data can be reduced without significantly reducing the performance of the optimization procedure. We propose two methods which provide successive reductions in complexity by: (1) using models to interpolate the rate and distortion characteristics, and (2) using past frames instead of current ones to determine the models. Our first method is applicable to situations (e.g., broadcast video) where a long encoding delay is possible, while our second approach is more useful for computation-constrained interactive video applications. The first method can also be used to benchmark other approaches. Both methods can achieve over 1 dB peak signal-to-noise rate (PSNR) gain over simple methods like the MPEG Test Model 5 (TM5) rate control, with even greater gains during scene change transitions. In addition, both methods make few a priori assumptions and provide robustness in their performance over a range of video sources and encoding rates. In terms of complexity, our first algorithm roughly doubles the encoding time as compared to simpler techniques (such as TM5). However, the complexity is greatly reduced as compared to methods which exactly measure the R-D data. Our second algorithm has a complexity marginally higher than TM5 and a PSNR performance slightly lower than that of the first approach.
Read full abstract