Abstract

In this paper we evaluate the performance and energy effectiveness of FPGA and CPU devices for a kind of parallel computing applications in which the workload can be distributed in a way that enables simultaneous computing in addition to simple off loading. The FPGA device is programmed via OpenCL using the recent availability of commercial tools and hardware while Threading Building Blocks (TBB) is used to orchestrate the load distribution and balancing between FPGA and the multicore CPU. We focus on streaming applications that can be implemented as a pipeline of stages. We present an approach that allows the user to specify the mapping of the pipeline stages to the devices (FPGA, GPU or CPU) and the number of active threads. Using as a case study a real streaming application, we evaluate how these parameters affect the performance and energy efficiency using as reference a heterogeneous system that includes four different types of computational resources: a quad-core Intel Haswell CPU, an embedded Intel HD6000 GPU, a discrete NVIDIA GPU and an Altera FPGA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call