Abstract

This paper proposes an effective hardware accelerator for 2D <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$8\times 8$ </tex-math></inline-formula> discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) using an improved Loeffler architecture. The accelerator optimizes the data stream of the Loeffler 8-point 1D DCT/IDCT according to the characteristics of image and video processing. An 8-stage pipeline structure greatly improves the processing speed by reasonably dividing the number of clock cycles and simplifying the arithmetic operations in each cycle. The multiplication-free approximation of the DCT coefficients is implemented through adders and shifters, combined with both fixed-point and canonic signed digit (CSD) coding. In particular, the proposed fast parallel transposed matrix architecture achieves the function of row-column coefficient conversion with lower circuit complexity. The FPGA implementation of the proposed architecture uses a Virtex-7 XC7VX330T device, running at 288 MHz with a throughput of 558 M Pixel/sec, and a Full HD real-time frame rate of up to 269 fps. Only 33 cycles are required to complete the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$8\times 8$ </tex-math></inline-formula> blocks of 2D DCT/IDCT, which can be used as a high-performance hardware accelerator for image and video compression encoding.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call