Abstract

This paper presents customized Systolic Array Architecture (SAA) design of Dual Tree Complex Wavelet (DTCWT) sub band computation based on multiplexed Distributive Arithmetic Algorithm (DAA). The proposed architecture is memory efficient and operates at frequencies greater than 300 MHz in decomposing 256 x 256 input images. Three architectures such as reduced order structure, multiplexed DA structure and zero pad structure are designed and evaluated for its performances for DTCWT computation minimizing arithmetic operations with improved latency. The proposed design is modeled in Verilog HDL and is implemented on Spartan-6 and Virtex-5 FPGA considering Xilinx ISE FPGA design flow. The latency of proposed architectures is evaluated to be 15 clock cycles and throughput is estimated to be 4 outputs for every 5 clock cycles. The SAA architecture occupies less than 12% of FPGA resources and consumes less than 10 mW of power on FPGA platform.

Highlights

  • Wavelets have played an important role in signal and image processing applications supporting both time and frequency localization property

  • Divakara et al [25] have reported on FPGA implementation of Dual Tree Complex Wavelet Transforms (CWT) (DTCWT) for image processing applications based on reorder and symmetric structure

  • We have proposed three architectures for DTCWT computation optimizing area and timing requirement

Read more

Summary

Introduction

Wavelets have played an important role in signal and image processing applications supporting both time and frequency localization property. Simplified structures for computing DTCWT are presented in [5,6,7] that require two real DWT filter bank structures or two critically sampled DWTs that process the input data in parallel. The first stage comprising of two filter pairs processes input image along the rows to generate output samples represented as {y1, y2, y3 and y4}. For an image of size N x N for row processing using one filter it requires 10N2 and 9N2 multipliers and adders respectively. Processing input data using 12 filters (both first stage and second stage), total number of multipliers and adders operations required are 120N2 and 108N2 respectively. Implementing DTCWT on FPGA platform requires optimizing number of arithmetic operations and memory elements. Few of the most popular methods for DWT implementation improving speed and optimizing area are reviewed that can provide an insight into the improved methods that are proposed in this work for DTCWT implementation

Review of high speed architectures
DTCWT architecture design
Reduced order architecture
Multiplexed DA architecture
A2 A1 A0 LUT Contents
Zero pad architecture
Systolic array architecture
Comparison of DTCWT architectures
FPGA Implementation
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call