High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory

Changdao Du,Yoshiki Yamaguchi

doi:10.3390/electronics9081275

Abstract

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.

Highlights

Over the past few years, offloading high-performance computing (HPC) applications to dedicated hardware accelerators has been a widely used solution [1,2]
To achieve the equal performance of GPGPUs, the existing studies mainly rely on employing the temporal parallelism of the stencil kernel to improve performance, thereby shifting the bottleneck of stencil computations from a memory bandwidth limitation to an FPGA hardware resource limitation
Suppose we only use temporal parallelism to achieve the same performance as in 4 × (2M + 1) stencil cells

Summary

Introduction

Over the past few years, offloading high-performance computing (HPC) applications to dedicated hardware accelerators has been a widely used solution [1,2]. To achieve the equal performance of GPGPUs, the existing studies mainly rely on employing the temporal parallelism of the stencil kernel to improve performance, thereby shifting the bottleneck of stencil computations from a memory bandwidth limitation to an FPGA hardware resource limitation Optimization strategies such as building on-chip sliding window buffers, replication and/or vectorization of computing units, and stream processing were discussed in these papers. This relies on the corresponding compiler to automatically partition memory resources to support parallel memory access [14,15], resulting in inefficient utilization of BRAM resource and redundant memory costs to scale the design performance with temporal parallelism This limits the design scalability to one spatial dimension [16,17], which misses the potential computing optimization opportunities of some stencil kernels.

Stencil Computation

FPGA with HBM Memory

Related Work

Stencil Computation Architecture

Sliding Window Buffer Design Approaches

Scaling along the x Dimension of the Target Stencil Space

Scaling along the y Dimension of the Target Stencil Space

Hybrid Scaling Strategy

Proposed Architecture Overview

HBM Memory Bandwidth Optimization

Performance Model

Limitation

Experiment Setup

Experiment Performance

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Aug 8, 2020
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

RPython high-level synthesis
Maciej Linczuk ... Radoslaw Cieszewski
-
Maciej Linczuk, et. al.Maciej Linczuk ... Radoslaw Cieszewski
28 Sep 2016
28 Sep 2016

Automated bug detection for pointers and memory accesses in High-Level Synthesis compilers
Pietro Fezzardi ... Fabrizio Ferrandi
-
Pietro Fezzardi, et. al.Pietro Fezzardi ... Fabrizio Ferrandi
01 Aug 2016
01 Aug 2016

Thread-Aware Area-Efficient High-Level Synthesis Compiler for Embedded Devices
Changsu Kim ... Shinnung Jeong
-
Changsu Kim, et. al.Changsu Kim ... Shinnung Jeong
27 Feb 2021
27 Feb 2021

Exploiting Computation Reuse for Stencil Accelerators
Yuze Chi ... Jason Cong
-
Yuze Chi, et. al.Yuze Chi ... Jason Cong
01 Jul 2020
01 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics