Impact of CUDA and OpenCL on Parallel and Distributed Computing

Abu Asaduzzaman,C Aldershof,Alec Trent,S Osborne,Fadi N Sibai

doi:10.1109/iceee52452.2021.9415927

Abstract

Along with high performance computer systems, the Application Programming Interface (API) used is crucial to develop efficient solutions for modern parallel and distributed computing. Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL) are two popular APIs that allow General Purpose Graphics Processing Unit (GPGPU, GPU for short) to accelerate processing in applications where they are supported. This paper presents a comparative study of OpenCL and CUDA and their impact on parallel and distributed computing. Mandelbrot set (represents complex numbers) generation, Marching Squares algorithm (represents embarrassingly parallelism), and Bitonic Sorting algorithm (represents distributed computing) are implemented using OpenCL (version 2.x) and CUDA (version 9.x) and run on a Linux-based High Performance Computing (HPC) system. The HPC system uses an Intel i7-9700k processor and an Nvidia GTX 1070 GPU card. Experimental results from 25 different tests using the Mandelbrot Set generation, the Marching Squares algorithm, and the Bitonic Sorting algorithm are analyzed. According to the experimental results, CUDA performs better than OpenCL (up to 7.34x speedup). However, in most cases, OpenCL performs at an acceptable rate (CUDA speedup is less than 2x).

Full Text