Exploiting Task Parallelism with OpenCL: A Case Study

Pekka Jääskeläinen,Matias Koskela,Simon Mcintosh-Smith,Cristóvão Cruz,Karen Egiazarian,James Price,Aram Danielyan,Ville Korhonen,Jarmo Takala

doi:10.1007/s11265-018-1416-1

Abstract

While data parallelism aspects of OpenCL have been of primary interest due to the massively data parallel GPUs being on focus, OpenCL also provides powerful capabilities to describe task parallelism. In this article we study the task parallel concepts available in OpenCL and find out how well the different vendor-specific implementations can exploit task parallelism when the parallelism is described in various ways utilizing the command queues. We show that the vendor implementations are not yet capable of extracting kernel-level task parallelism from in-order queues automatically. To assess the potential performance benefits of in-order queue parallelization, we implemented such capabilities to an open source implementation of OpenCL. The evaluation was conducted by means of a case study of an advanced noise reduction algorithm described as a multi-kernel OpenCL application.

Highlights

OpenCL is a widely-adopted programming standard for parallel heterogeneous systems
While data parallelism aspects of OpenCL have been of primary interest to its users due to the massively parallel GPU devices being on focus, OpenCL provides extensive capabilities to describe heterogeneous task parallelism by means of pushing commands to one or more command queues controlling one or more devices, and using events, command queue barriers or kernel argument buffer data dependencies for synchronization
The results suggest that AMD’s SDK is not currently making data locality aware scheduling decisions based on the command queue dependencies, but schedules from command queues “fairly” which had severe impact on the platforms with more limited cache resources of this case study

Summary

Introduction

OpenCL is a widely-adopted programming standard for parallel heterogeneous systems. The goal of the standard is to support a wide range of heterogeneous platforms efficiently and provide source code portability across them. While data parallelism aspects of OpenCL have been of primary interest to its users due to the massively parallel GPU devices being on focus, OpenCL provides extensive capabilities to describe heterogeneous task parallelism by means of pushing commands to one or more command queues controlling one or more devices, and using events, command queue barriers or kernel argument buffer data dependencies for synchronization. We consider this side of the standard underutilized despite it being the feature to efficiently harness devices in heterogeneous platforms to collaboratively execute multikernel applications by reducing the “master role” of the host program.

Platform-Wide Execution of Heterogeneous Task Graphs

Task Parallel Concepts in OpenCL

Converting Command Queues to Task Graphs

Constructing the Task Graph

Command Queue Data Dependence Analysis

Implementing a Task Scheduling Runtime

Dynamic Construction of Task Graphs

Dynamic Task Scheduling for Shared Memory Multicores

The Application

Tested Runtimes

Related Work

Conclusions

12. Movidius

Findings

15. Texas Instruments

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Signal Processing Systems	Publication Date: Oct 15, 2018
Citations: 12	License type: open-access

R Discovery Prime

R Discovery Prime

Exploiting Task Parallelism with OpenCL: A Case Study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Signal Processing Systems

Lead the way for us

Similar Papers

Braid: integrating task and data parallelism
E.A West ... A.S Grimshaw
-
E.A West, et. al.E.A West ... A.S Grimshaw
06 Feb 1995
06 Feb 1995

Task parallelism and high-performance languages
I Foster
-
I FosterI Foster
01 Mar 1996
01 Mar 1996

Mixed data and task parallelism with HPF and PVM
Salvatore Orlando ... Paolo Palmerini
Cluster Computing | VOL. 3
Salvatore Orlando, et. al.Salvatore Orlando ... Paolo Palmerini
01 Jan 1999
Cluster Computing | VOL. 3

Application of an object-oriented parallel run-time system to a Grand Challenge 3D multi-grid code
C Baillie ... S Vajracharya
-
C Baillie, et. al.C Baillie ... S Vajracharya
01 Jan 1996
01 Jan 1996

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Task Parallelism with OpenCL: A Case Study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Signal Processing Systems