An FPGA-Based Hardware Accelerator for Real-Time Block-Matching and 3D Filtering

Dong Wang,Jia Xu,Ke Xu

doi:10.1109/access.2020.3006773

Abstract

Block-matching and 3D filtering (BM3D) denoising algorithm has been employed in many application fields because of its superior image processing quality. Due to the huge computational workload, real-time implementation of this algorithm is very challenging. Recently, studies on accelerating the BM3D algorithm on GPU have presented impressive speed up over CPU-based implementations. However, GPU devices are generally inefficient in energy dissipation and, thus, are not suitable for embedded application scenarios. In this paper, we propose a dedicated hardware accelerator design to efficiently boost the BM3D algorithm with reduced power consumption on FPGA device. The proposed design is based on a deeply pipelined OpenCL kernel architecture that can efficiently speed up the compute-intensive procedures of the denoising algorithm by exploiting the intrinsic parallelism and maximizing data reuse. The final design was implemented on Intel's Arria-10 GX1150 FPGA, and achieved an average 1.2× performance boost and an outstanding 8.3× reduction in energy dissipation when compared to a state-of-the-art GPU-based software design.

Highlights

Image denoising plays an important role in image and video processing and has become one of the most fundamental technologies in many fields, such as digital camera [1], medical image processing [2] and computer vision [3]
We propose an performance improved field-programmable gate array devices (FPGAs) accelerator design for real-time processing of the block-matching and 3D filtering algorithm based on our previous study of [17]
EXPERIMENTAL SETUP To evaluate the performance of the proposed accelerator, we have implemented the design on Intel’s A10 FPGA development board

Summary

INTRODUCTION

Image denoising plays an important role in image and video processing and has become one of the most fundamental technologies in many fields, such as digital camera [1], medical image processing [2] and computer vision [3]. The detailed contribution of this study includes: (1) we present a quantitative analysis of the complexity of each functions of the BM3D algorithm and propose a accelerator architecture based on deeply-pipelined OpenCL kernels to implement the partitioned sub-algorithms; (2) A dedicated systolic-like array architecture for parallel block-matching is developed to efficiently exploit fine-grained data-level parallelism of the algorithm through pipelining, and at the same time, save large amount of hardware resources by avoiding using very wide data-buses to support high throughput computation; (3) A parallel linebuffer-based on-chip data caching scheme is introduced such that data reuse is maximized and the demand on external memory bandwidth is greatly reduced; (4) We have implemented the proposed design on Intel’s Arria GX1150 FPGA device, and experiment results showed that our design gained more than 20% performance improvement and in the meantime achieved a significant 8.3× advantage in power consumption over state-of-the-art GPU-based design. We have verified that this algorithm optimization has no obvious impact on denoising quality

COLLABORATIVE DENOISE FILTERING

AGGREGATION

AGGREGATION KERNEL

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An FPGA-Based Hardware Accelerator for Real-Time Block-Matching and 3D Filtering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Microstructural and micromechanical evolution features of asphalt with aging: An atomic force microscopy study
Enyong Sun ... Tiankai Che
Construction and Building Materials | VOL. 441
Enyong Sun, et. al.Enyong Sun ... Tiankai Che
19 Jul 2024
Construction and Building Materials | VOL. 441

Comparison of processing performance and architectural efficiency metrics for FPGAs and GPUs in 3D Ultrasound Computer Tomography
Matthias Birk ... Nicole Ruiter
-
Matthias Birk, et. al.Matthias Birk ... Nicole Ruiter
01 Dec 2012
01 Dec 2012

Minimization of energy dissipation in glitch free and cascadable adiabatic logic circuits
N Siva Sankara Reddy ... K Lal Kishore
-
N Siva Sankara Reddy, et. al.N Siva Sankara Reddy ... K Lal Kishore
01 Nov 2008
01 Nov 2008

Energy-Efficient Multi-Pipeline Architecture for Terabit Packet Classification
Weirong Jiang ... Viktor K Prasanna
-
Weirong Jiang, et. al.Weirong Jiang ... Viktor K Prasanna
01 Nov 2009
01 Nov 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An FPGA-Based Hardware Accelerator for Real-Time Block-Matching and 3D Filtering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access