Abstract

The solution of high-dimensional PBMs using CPUs are often computationally intractable. This study focuses on the development of a scalable algorithm to parallelize the nested loops inside the PBM via a GPU framework. The developed PBM is unique since it adapts to the size of the problem and uses the GPU cores accordingly. This algorithm was parallelized for NVIDIA® GPUs as it was written in CUDA® and C/C++. The major bottleneck of such algorithms is the communication time between the CPU and the GPU. In our studies, communication time contributed to less than 1% of the total run time and a maximum speedup of about 12 over the serial CPU code was achieved. The GPU PBM achieved a speedup of about two times compared to the PBM’s multi-core configuration on a desktop computer. The speed improvements are also reported for various CPU and GPU architectures and configurations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call