In this paper, low latency and high throughput texture coding architectures are proposed to realize the 4×4 integer/Hadamard transforms, the quantization (Q), and the inverse‐quantization (IQ) schemes for the H.264/AVC application. Based on matrix operations, the efficient fast two‐dimensional (2‐D) 4×4 transforms can be derived from the proposed one‐dimensional (1‐D) fast 4×4 transforms through matrix decompositions. The fast 2‐D 4×4 transform designs with the hardware sharing architecture can achieve high throughput and only need one clock cycle latency delay. The proposed cost‐effective and hardware sharing fast 2‐D 4×4 transform scheme doesn't require the transpose memory and can be applied to the 4CIF 4:2:0 video encoding. The hardware sharing architecture for both of the Q and the IQ is also developed for the low‐cost application. With Xilinx FPGA verifications, the proposed low‐cost 4×4 texture coding scheme, which can be applied to the CIF 4:2:0 30 frames/sec video encoding, can process up to 84 MHz with 90 k gate counts. Then the proposed high speed 4×4 texture coding design, which can be applied to the 4CIF 4:2:0 30 frames/ sec video encoding, can process up to 99 MHz with 135 k gate counts. Both of the two proposed texture coding architectures only require 4 clock cycles latency delay which is smaller than the traditional row‐column architectures do.
Read full abstract