The discrete cosine transform (DCT) is one of the major components in most of image and video compression systems. The variable complexity algorithm framework has been applied successfully to achieve complexity savings in the computation of the inverse DCT in decoders. These gains can be achieved due to the highly predictable sparseness of the quantized DCT coefficients in natural image/video data. With the increasing demand for instant video messaging and two-way video transmission over mobile communication systems running on general-purpose embedded processors, the encoding complexity needs to be optimized. In this paper, we focus on complexity reduction techniques for the forward DCT, which is one of the more computationally intensive tasks in the encoder. Unlike the inverse DCT, the forward DCT does not operate on sparse input data, but rather generates sparse output data. Thus, complexity reduction must be obtained using different methods from those used for the inverse DCT. In the literature, two major approaches have been applied to speed up the forward DCT computation, namely, frequency selection, in which only a subset of DCT coefficients is computed, and accuracy selection, in which all the DCT coefficients are computed with reduced accuracy. These two approaches can achieve significant computation savings with minor output quality degradation, as long as the coding parameters are such that the quantization error is larger than the error due to the approximate DCT computation. Thus, in order to be useful, these algorithms have to be combined using an efficient mechanism that can select the "right" level of approximation as a function of the characteristics of the input and the target rate, a selection that is often based on heuristic criteria. In this paper, we consider two previously proposed fast, variable complexity, forward DCT algorithms, one based on frequency selection, the other based on accuracy selection. We provide an explicit analysis of the additional distortion that each scheme introduces as a function of the quantization parameter and the variance of the input block. This analysis then allows us to improve the performance of these algorithms by making it possible to select the best approximation level for each block and a target quantization parameter. We also propose a hybrid algorithm that combines both forms of complexity reduction in order to achieve overall better performance over a broader range of operating rates. We show how our techniques lead to scalable implementations where complexity can be reduced if needed, at the cost of small reductions in video quality. Our hybrid algorithm can speed up the DCT and quantization process by close to a factor of 4 as compared to fixed-complexity forward DCT implementations, with only a slight quality degradation in PSNR.
Read full abstract