Abstract

This paper presents experimental results that compares between a full software (SW) implementation and a software/hardware (SW/HW) co-design implementation of a DSP algorithm on a Xilinx Zynq programmable System-on-chip (SoC). The case study being used is the 8x8 two-dimensional discrete cosine transform (2D DCT), present in the popular JPEG and MPEG4-AVC encoder. The full SW design is implemented on a hardcore ARM processor on the FPGA SoC. The SW/HW co-design utilizes both the ARM processor and the Configurable Logic Blocks (CLB) of the FPGA SoC, where the communication channel is implemented using the Xillybus FIFO buffers, implemented as an external DRAM. In this case, the core 2D DCT operations are executed on the CLB, while data initialization and transfers are implemented on the processor. Results show that SW implementation is faster compared to SW/HW implementation for data inputs of less than 0.48 megapixels. As data inputs get larger, SW/HW implementation shows better performance, with up to 2x faster for 2 megapixels data input size. This study proves the viability of implementing the 2D DCT operations as dedicated hardware accelerator in multimedia encoders.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call