Abstract

The aim of this paper is to evaluate performance of new CUDA mechanisms—unified memory and dynamic parallelism for real parallel applications compared to standard CUDA API versions. In order to gain insight into performance of these mechanisms, we decided to implement three applications with control and data flow typical of SPMD, geometric SPMD and divide-and-conquer schemes, which were then used for tests and experiments. Specifically, tested applications include verification of Goldbach’s conjecture, 2D heat transfer simulation and adaptive numerical integration. We experimented with various ways of how dynamic parallelism can be deployed into an existing implementation and be optimized further. Subsequently, we compared the best dynamic parallelism and unified memory versions to respective standard API counterparts. It was shown that usage of dynamic parallelism resulted in improvement in performance for heat simulation, better than static but worse than an iterative version for numerical integration and finally worse results for Golbach’s conjecture verification. In most cases, unified memory results in decrease in performance. On the other hand, both mechanisms can contribute to simpler and more readable codes. For dynamic parallelism, it applies to algorithms in which it can be naturally applied. Unified memory generally makes it easier for a programmer to enter the CUDA programming paradigm as it resembles the traditional memory allocation/usage pattern.

Highlights

  • It can be noticed that heterogeneous computer systems have gained more and more popularity

  • dynamic parallelism (DP) and UM mechanisms available in the CUDA API and platform were compared for three different applications, representative of: geometric SPMD processing—applications such as heat transfer simulation in 2D space and verification of Goldbach’s conjecture, as well as divide and conquer processing—adaptive numerical integration of a function over a given range

  • Detailed analysis was presented for each application with several optimization applied and compared with and without DP and UM

Read more

Summary

Introduction

It can be noticed that heterogeneous computer systems have gained more and more popularity. Almost every user of a personal computer, beyond the standard CPU device, has an additional compute device with significant computing power which is the GPU. Because of such popularity and tremendous potential of such devices, it is crucial to be able to make the most of them and be able to assess how beneficial new features of the technology are. Usage of GPU can even save days, during neural network training [27] For these reasons, new tools and platforms are released, to provide better and simpler ways to create applications and programs. It is important for people whose main profession is not programming, i.e., domain specialists

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call