This paper presents an energy- and area-efficient architecture for approximated discrete cosine transform (DCT). Due to the good compression ability, DCT is widely exploited in signal processing. However, it is computationally intensive especially for large transform sizes. In this paper, we have reduced the computation cost of DCT by truncating a couple of least significant bits (LSB), most significant bits (MSB), and zero columns. First, considering that the contribution of LSBs is weakened because of the final right shift operation, we have eliminated the computation process for some LSBs. For the addition of the remaining LSBs, a parallel carry propagation adder is proposed to reduce the calculation latency. Second, owing to the phenomenon that high-frequency components are quite small in natural scenes, a couple of MSBs are selectively truncated according to their positions. Third, quantization is taken into account for the system-level optimization. The quantized results of all-zero columns are utilized to skip the column transforms afterward. The experimental results show that at most 32% area consumption and 60% power consumption can be reduced compared with the originally accurate DCT, while the compression efficiency loss caused by the DCT approximation is negligible for High Efficiency Video Coding.
Read full abstract