Slice-delay Product Research Articles

In this paper we have presented a memory efficient architecture for 3-D DWT using overlapped grouping of frames. Proposed structure does not involve any line-buffer or frame-buffer for 1-level 3-D DWT. It involves only a frame-buffer of size O(MN) to compute multilevel 3-D DWT, unlike the existing folded structures which involve frame-buffer of size O(MNR) . The saving of line-buffer and frame-buffer by the proposed structure for the implementation of first-level DWT is of substantial advantage, since the frame-size is very often as large as 1920 × 1080 and frame-rate varies from 15 to 60 fps. The proposed structure has a small cycle period, and offers small output latency compared to the existing structures. Compared to the best of the available designs, the proposed design involves significantly less memory words. For frame-size 17 × 144 and frame-rate 60 fps, the proposed structure involves 7.96 times less memory words and involves 12.3% less average computation time (ACT) than the best of the existing folded designs. It involves 4.28 times less memory words than the recently proposed parallel design. The synthesis result for frame-size 176 × 144 and frame-rate 60 fps for the FPGA device 6VLX760FF1760-2 shows that the proposed structure involves 9.6 times less BRAMs and offers 2 times higher throughput than the folded design. It involves 1.9 times less BRAMs than the parallel design and offers nearly same throughput rate. The proposed structure has significantly less slice-delay-product (SDP) and dissipates significantly less dynamic power than the existing structures.

In this paper, we present a modular and pipeline architecture for lifting-based multilevel 2-D DWT, without using line-buffer and frame-buffer. Overall area-delay product is reduced in the proposed design by appropriate partitioning and scheduling of the computation of individual decomposition-levels. The processing for different levels is performed by a cascaded pipeline structure to maximize the hardware utilization efficiency (HUE). Moreover, the proposed structure is scalable for high-throughput and area-constrained implementation. We have removed all the redundancies resulting from decimated wavelet filtering to maximize the HUE. The proposed design involves L pyramid algorithm (PA) units and one recursive pyramid algorithm (RPA) unit, where R = N / P , L =⌈log 4 P̅ ⌉ and P is the input block size, M and N , respectively, being the height and width of the image. The entire multilevel DWT is computed by the proposed structure in MR cycles. The proposed structure has O (8 R ×2 L ) cycles of output latency, which is very small compared to the latency of the existing structures. Interestingly, the proposed structure does not require any line-buffer or frame-buffer, unlike the existing folded structures which otherwise require a line-buffer of size O ( N ) and frame-buffer of size O ( M /2× N /2) for multilevel 2-D computation. Instead of those buffers, the proposed structure involves only local registers and RAM of size O ( N ). The saving of line-buffer and frame-buffer achieved by the proposed design is an important advantage, since the image size could very often be as large as 512 × 512. From the simulation results we find that, the proposed scalable structure offers better slice-delay-product (SDP) for higher throughput of implementation since the on-chip memory of this structure remains almost unchanged with input block size. It has 17% less SDP than the best of the corresponding existing structures on average, for different input-block sizes and image sizes. It involves 1.92 times more transistors, but offers 12.2 times higher throughput and consumes 52% less power per output (PPO) compared to the other, on average for different input sizes.

Slice-delay Product Research Articles

Related Topics

Articles published on Slice-delay Product

Performance Analysis in Higher-Order IIR Filter Structures with Application to EEG Signal

A Resource-Efficient Multiplierless Systolic Array Architecture for Convolutions in Deep Networks

Precomputation‐based radix‐4 CORDIC for approximate rotations and Hough transform

Area and Speed Efficient Implementation of Symmetric FIR Digital Filter through Reduced Parallel LUT Decomposed DA Approach

Implementation of Efficiency CORDIC Algorithmfor Sine & Cosine Generation

Area-Time Efficient Scaling-Free CORDIC Using Generalized Micro-Rotation Selection

Memory-Efficient Architecture for 3-D DWT Using Overlapped Grouping of Frames

Memory Efficient Modular VLSI Architecture for Highthroughput and Low-Latency Implementation of Multilevel Lifting 2-D DWT

Efficient CORDIC Algorithms and Architectures for Low Area and High Throughput Implementation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Slice-delay Product Research Articles

Related Topics

Articles published on Slice-delay Product

Performance Analysis in Higher-Order IIR Filter Structures with Application to EEG Signal

A Resource-Efficient Multiplierless Systolic Array Architecture for Convolutions in Deep Networks

Precomputation‐based radix‐4 CORDIC for approximate rotations and Hough transform

Area and Speed Efficient Implementation of Symmetric FIR Digital Filter through Reduced Parallel LUT Decomposed DA Approach

Implementation of Efficiency CORDIC Algorithmfor Sine &amp; Cosine Generation

Area-Time Efficient Scaling-Free CORDIC Using Generalized Micro-Rotation Selection

Memory-Efficient Architecture for 3-D DWT Using Overlapped Grouping of Frames

Memory Efficient Modular VLSI Architecture for Highthroughput and Low-Latency Implementation of Multilevel Lifting 2-D DWT

Efficient CORDIC Algorithms and Architectures for Low Area and High Throughput Implementation

Implementation of Efficiency CORDIC Algorithmfor Sine & Cosine Generation