Abstract

Numerous VLSI architectures for 2-D discrete wavelet transform (DWT) have been brought forward. While most of the designs displayed good performance through parallel processing, few of them addressed thoroughly how to sustain such high throughput computing which is crucial in real-time applications. Although the affordable data transfer bandwidth has been increased tremendously during the past decade, the pressure on data communication has not yet been relieved from stream-intensive applications. The design of 2-D DWT belongs to such cases. In this paper, we expose the performance gap between the computing core and the entire system, distinguishing them by quantitative approach with metrics of peak performance and mean-time performance. In order to narrow down the discrepancy without degrading either of the two criteria, on the one hand, we introduce a software-pipelining lifting-based computing kernel to remove data dependence for peak performance, on the other hand, we apply loop fusing technique and a hierarchical pipelining method to enhance data locality and boost the mean-time performance. The architecture has been implemented in Xilinx Virtex-II FPGA, taking advantage of Virtex-II’s embedded multipliers and block RAMs. We use Daubechies (9, 7) and LeGall (5, 3) filters (the default lossy and lossless filters in JPEG2000) for illustration whereas it is a general method for other DWT filters. The post-place and routing operation frequency for Daubechies (9, 7) is 138 MHz. Notably, the mean-time performance parameterized by image size and decomposition level achieves closely to peak performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call