Optimizing OpenCL Code for Performance on FPGA: k-Means Case Study With Integer Data Sets

Nuno Paulino,João Canas Ferreira,João M P Cardoso

doi:10.1109/access.2020.3017552

Abstract

High Level Synthesis (HLS) tools targeting Field Programmable Gate Arrays (FPGAs) aim to provide a method for programming these devices via high-level abstractions. Initially, HLS support for FPGAs focused on compiling C/C++ to hardware circuits. This raised the issue of determining the programming practices which resulted in the best performing circuits. Recently, to further increase the applicability of HLS approaches, renewed effort was placed on support for HLS of OpenCL code for FPGA, raising the same issues of coding practices and performance portability. This paper explores the performance of OpenCL code compiled for FPGAs for different coding techniques. We evaluate the use of task-kernels versus NDRange kernels, data vectorization, the use of on-chip local memories, and data transfer optimizations by exploiting burst access inference. We present this exploration via a case study of the k-means algorithm, and produce a total of 10 OpenCL implementations of the kernel. To determine the effects of different data set characteristics, and to determine the gains from specialization based on number of attributes, we generated a total of 12 integer data sets. The data sets vary regarding the number of instances, number of attributes (i.e., features), and number of clusters. We also vary the number of processing cores, and present the resulting required resources and operating frequencies. Finally, we execute the same OpenCL code on a 4 GHz Intel i7-6700K CPU, showing that the FPGA achieves speedups up to $1.54 {\times } $ for four cases, and energy savings up to 80% in all cases.

Highlights

Unlike devices such as Central Processing Units (CPUs) and Graphics Processing Units (GPUs), the reconfigurability of Field Programmable Gate Array (FPGA) allows for very finely-tuned and application-specific implementations of circuits
We evaluate the performance of OpenCL code on FPGA resulting from applying multiple coding techniques, including the use of single-task kernels versus NDRange kernels, combined with data vectorization and the use of local memories and burst accesses to local memory
Data is exchanged between the CPU and the FPGA via the system memory, by resorting to traditional OpenCL API calls

Summary

INTRODUCTION

Unlike devices such as Central Processing Units (CPUs) and Graphics Processing Units (GPUs), the reconfigurability of Field Programmable Gate Array (FPGA) allows for very finely-tuned and application-specific implementations of circuits. The beneficial circuit specialization, the respective lack of programmability implies the same design effort for future revisions In order make these devices more suited for general use, over a decade of development has focused on efficient generation of circuits via High Level Synthesis (HLS) of source code such as (subsets of) C/C++ or MATLAB [4], [5]. We evaluate the performance of OpenCL code on FPGA resulting from applying multiple coding techniques, including the use of single-task kernels versus NDRange kernels, combined with data vectorization and the use of local memories and burst accesses to local memory. We study these aspects via the popular k-means algorithm [15].

RELATED WORK

IMPLEMENTED CODE VERSIONS

FPGA VS CPU

COMPARISON TO C IMPLEMENTATION

DISCUSSION AND OBSERVATIONS

Result

VIII. CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Optimizing OpenCL Code for Performance on FPGA: k-Means Case Study With Integer Data Sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

A Common Backend for Hardware Acceleration on FPGA
Emanuele Del Sozzo ... Saman Amarasinghe
-
Emanuele Del Sozzo, et. al.Emanuele Del Sozzo ... Saman Amarasinghe
01 Nov 2017
01 Nov 2017

Efficient FPGA Cost-Performance Space Exploration using Type-Driven Program Transformations
Cristian Urlea ... Syed Waqar Nabi
-
Cristian Urlea, et. al.Cristian Urlea ... Syed Waqar Nabi
01 Dec 2019
01 Dec 2019

A Unified Backend for Targeting FPGAs from DSLs
Emanuele Del Sozzo ... Riyadh Baghdadi
-
Emanuele Del Sozzo, et. al.Emanuele Del Sozzo ... Riyadh Baghdadi
01 Jul 2018
01 Jul 2018

Hardware implementation of principal component analysis for gas identification systems on the Zynq SoC platform
Amine Ait Si Ali
-
Amine Ait Si AliAmine Ait Si Ali
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing OpenCL Code for Performance on FPGA: k-Means Case Study With Integer Data Sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions