Novel decomposed lifting scheme (DLS) is presented to perform one-dimensional (1D) discrete wavelet transform (DWT) with consistent data flow in both row and column dimension. Based on the proposed DLS, intermediate data can be transferred seamlessly between the column processor and the row processor in the hardware implementation of two-dimensional (2D) DWT, resulting in the reduction of on-chip memory, output latency and control complexity. Moreover, the implementation of 2D DWT can be easily extended to achieve higher processing speed with controlled increase of hardware cost. Memory-efficient and high-speed architectures are proposed to implement 2D DWT for JPEG2000, which are called fast architecture (FA) and high-speed architecture (HA). FA and HA can perform 2D DWT in N 2 /2 and N 2 /4 clock cycles for an N×N image, respectively, but the required internal memory is only 4N for 9/7 DWT and 2N for 5/3 DWT. Compared with the works reported in previous literature, the proposed designs provide excellent performance in hardware cost, control complexity, output latency and computing time. The proposed designs were implemented to process 2D 9/7 DWT in SMIC 0.18 μm CMOS logic fabrication with 4 KB internal memory for the image size 512?×?512. The areas are only 999137 um 2 and 1333054 um 2 for FA and HA, respectively, but the operation frequency can be up to 150 MHz.
Read full abstract