Abstract
This paper presents experimental results that compares between a full software (SW) implementation and a software/hardware (SW/HW) co-design implementation of a DSP algorithm on a Xilinx Zynq programmable System-on-chip (SoC). The case study being used is the 8x8 two-dimensional discrete cosine transform (2D DCT), present in the popular JPEG and MPEG4-AVC encoder. The full SW design is implemented on a hardcore ARM processor on the FPGA SoC. The SW/HW co-design utilizes both the ARM processor and the Configurable Logic Blocks (CLB) of the FPGA SoC, where the communication channel is implemented using the Xillybus FIFO buffers, implemented as an external DRAM. In this case, the core 2D DCT operations are executed on the CLB, while data initialization and transfers are implemented on the processor. Results show that SW implementation is faster compared to SW/HW implementation for data inputs of less than 0.48 megapixels. As data inputs get larger, SW/HW implementation shows better performance, with up to 2x faster for 2 megapixels data input size. This study proves the viability of implementing the 2D DCT operations as dedicated hardware accelerator in multimedia encoders.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.