Abstract
FPGA-based accelerators have recently evolved as strong competitors to the traditional GPU-based accelerators in modern high-performance computing systems. They offer both high computational capabilities and considerably lower energy consumption. High-level synthesis (HLS) can be used to overcome the main hurdle in the mainstream usage of the FPGA-based accelerators, i.e., the complexity of their design flow. HLS enables the designers to program an FPGA directly by using high-level languages, e.g., C, C++, SystemC, and OpenCL. This paper presents an HLS-based FPGA implementation of several algorithms from a variety of application domains. A performance comparison in terms of execution time, energy, and power consumption with some high-end GPUs is performed as well. The algorithms have been modeled in OpenCL for both GPU and FPGA implementation. We conclude that FPGAs are much more energy-efficient than GPUs in all the test cases that we considered. Moreover, FPGAs can sometimes be faster than GPUs by using an FPGA-specific OpenCL programming style and utilizing a variety of appropriate HLS directives.
Highlights
Modern electronic devices like smart phones are required to perform a variety of tasks ranging from simpler text messaging to more computationally intensive multimedia operations
field programmable gate array (FPGA) IMPLEMENTATION This section of the paper gives a brief overview of the Open Computing Language (OpenCL) programming framework, including its platform and memory model, followed by a detailed description of the design flow starting from the OpenCL code and terminating with final FPGA implementation
This paper performs an extensive analysis of the prospect of using high-level synthesis for implementing FPGA-based accelerators in modern high performance computing (HPC) systems
Summary
Modern electronic devices like smart phones are required to perform a variety of tasks ranging from simpler text messaging to more computationally intensive multimedia operations This has resulted in the development of heterogeneous system architectures in modern system-on-chip (SoC) designs. Such systems mitigate the issues encountered by multicore scaling (using several homogeneous cores), stemming mainly from the so called memory wall and Von Neumann bottleneck [1], [2]. Graphical processing units (GPUs) offer higher floating point throughput, a favorable architecture for data parallelism and higher memory bandwidth than processors These properties make them good candidates to be used as accelerators in modern high performance computing (HPC) systems [4]. The HPC systems using GPU-based accelerators are inefficient in terms of power consumption [5]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.