Comparison of OpenMP &amp; OpenCL Parallel Processing Technologies

Krishnahari Thouti,S.R.Sathe

doi:10.14569/ijacsa.2012.030410

Abstract

This paper presents a comparison of OpenMP and OpenCL based on the parallel implementation of algorithms from various fields of computer applications. The focus of our study is on the performance of benchmark comparing OpenMP and OpenCL. We observed that OpenCL programming model is a good option for mapping threads on different processing cores. Balancing all available cores and allocating sufficient amount of work among all computing units, can lead to improved performance. In our simulation, we used Fedora operating system; a system with Intel Xeon Dual core processor having thread count 24 coupled with NVIDIA Quadro FX 3800 as graphical processing unit.

Highlights

Nowadays, Quad-core, multi-core & GPUs [1] have already become the standard for both workstations and high performance computers
We compare the performance of these test cases with the OpenCL code on the GPU and on a multi-core CPU with Open MP support
Support for recursion is introduced in OpenMP 3.0 specifications by “task “clause. we find that there is no significant improvement in performance, since most of the code to be parallelized is kept in critical section region as shown below: int put(int Queens[], int row, int column)

Summary

INTRODUCTION

Quad-core, multi-core & GPUs [1] have already become the standard for both workstations and high performance computers These systems use aggressive multithreading so that whenever a thread is stalled, waiting for data, the thread can efficiently switch to execute another thread. A diversity of high-performance architectures, there is a question of which is the best fit for a given workload and extent to which an application benefit from these systems, depends on availability of cores and other workload parameters. This paper addresses these issues by implementing parallel algorithms for the four test cases and compares their performance in terms of time taken to execute and percentage of speed-up factor achieved.

PARALLEL COMPUTING PARADIGM

Shared Memory System

Distributed Memory System

EXPERIMENTAL RESULTS

Matrix Multiplication

Image Convolution

String Reversal

RELATED WORK

CONCLUSION & FUTURE SCOPE

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2012
Citations: 17	License type: cc-by

R Discovery Prime

R Discovery Prime

Comparison of OpenMP & OpenCL Parallel Processing Technologies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Dynamic Heterogeneous scheduling of GPU-CPU in Distributed Environment
Suman Goyat ... Shri Kant
-
Suman Goyat, et. al.Suman Goyat ... Shri Kant
01 Nov 2019
01 Nov 2019

Reduction of computing time for seismic applications based on the Helmholtz equation by Graphics Processing Units

-

03 Mar 2015
03 Mar 2015

Parallel Implementations of Recurrent Neural Network Learning
Uroš Lotrič ... Andrej Dobnikar
-
Uroš Lotrič, et. al.Uroš Lotrič ... Andrej Dobnikar
01 Jan 2009
01 Jan 2009

Efficient Utilization of a CPU-GPU Cluster
Douglas Schwer ... David Fyfe
-
Douglas Schwer, et. al.Douglas Schwer ... David Fyfe
09 Jan 2012
09 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of OpenMP &amp; OpenCL Parallel Processing Technologies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Comparison of OpenMP & OpenCL Parallel Processing Technologies