Abstract
This paper presents customized Systolic Array Architecture (SAA) design of Dual Tree Complex Wavelet (DTCWT) sub band computation based on multiplexed Distributive Arithmetic Algorithm (DAA). The proposed architecture is memory efficient and operates at frequencies greater than 300 MHz in decomposing 256 x 256 input images. Three architectures such as reduced order structure, multiplexed DA structure and zero pad structure are designed and evaluated for its performances for DTCWT computation minimizing arithmetic operations with improved latency. The proposed design is modeled in Verilog HDL and is implemented on Spartan-6 and Virtex-5 FPGA considering Xilinx ISE FPGA design flow. The latency of proposed architectures is evaluated to be 15 clock cycles and throughput is estimated to be 4 outputs for every 5 clock cycles. The SAA architecture occupies less than 12% of FPGA resources and consumes less than 10 mW of power on FPGA platform.
Highlights
Wavelets have played an important role in signal and image processing applications supporting both time and frequency localization property
Divakara et al [25] have reported on FPGA implementation of Dual Tree Complex Wavelet Transforms (CWT) (DTCWT) for image processing applications based on reorder and symmetric structure
We have proposed three architectures for DTCWT computation optimizing area and timing requirement
Summary
Wavelets have played an important role in signal and image processing applications supporting both time and frequency localization property. Simplified structures for computing DTCWT are presented in [5,6,7] that require two real DWT filter bank structures or two critically sampled DWTs that process the input data in parallel. The first stage comprising of two filter pairs processes input image along the rows to generate output samples represented as {y1, y2, y3 and y4}. For an image of size N x N for row processing using one filter it requires 10N2 and 9N2 multipliers and adders respectively. Processing input data using 12 filters (both first stage and second stage), total number of multipliers and adders operations required are 120N2 and 108N2 respectively. Implementing DTCWT on FPGA platform requires optimizing number of arithmetic operations and memory elements. Few of the most popular methods for DWT implementation improving speed and optimizing area are reviewed that can provide an insight into the improved methods that are proposed in this work for DTCWT implementation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Informacije MIDEM - Journal of Microelectronics, Electronic Components and Materials
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.