TuCCompi: A Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities

Hector Ortega-Arranz,Diego R Llanos,Yuri Torres,Arturo Gonzalez-Escribano

doi:10.1007/s10766-015-0349-6

Hector Ortega-Arranz, Diego R Llanos + Show 2 more

Open Access

https://doi.org/10.1007/s10766-015-0349-6

Copy DOI

Abstract

During the last decade, parallel processing architectures have become a powerful tool to deal with massively-parallel problems that require high performance computing (HPC). The last trend of HPC is the use of heterogeneous environments, that combine different computational processing devices, such as CPU-cores and graphics processing units (GPUs). Maximizing the performance of any GPU parallel implementation of an algorithm requires an in-depth knowledge about the GPU underlying architecture, becoming a tedious manual effort only suited for experienced programmers. In this paper, we present TuCCompi, a multi-layer abstract model that simplifies the programming on heterogeneous systems including hardware accelerators, by hiding the details of synchronization, deployment, and tuning. TuCCompi chooses optimal values for their configuration parameters using a kernel characterization provided by the programmer. This model is very useful to tackle problems characterized by independent, high computational-load independent tasks, such as embarrassingly-parallel problems. We have evaluated TuCCompi in different, real-world, heterogeneous environments using the all-pair shortest-path problem as a case study.

Highlights

Some computing-intensive problems are divided into many independent tasks that can be executed in parallel without requiring any communication among them
In order to give support to the massive demand of High Performance Computing (HPC), the last trends focus on the use of heterogeneous environments including computational units of different nature, such as common CPU-cores, graphics processing units (GPUs) and other hardware accelerators
We present TuCCompi (Tuned, Concurrent Cuda, OpenMP and MPI), a multi-layer, skeleton-based abstract model, that transparently exploits heterogeneous systems and squeezes the GPU capabilities by automatically choosing the optimal values for important configuration parameters

Summary

Introduction

Some computing-intensive problems are divided into many independent tasks that can be executed in parallel without requiring any communication among them. In order to give support to the massive demand of HPC, the last trends focus on the use of heterogeneous environments including computational units of different nature, such as common CPU-cores, graphics processing units (GPUs) and other hardware accelerators. The exploitation of these environments offers a higher peak performance and a better efficiency compared to the classical homogeneous cluster systems [6]. Since the cost of building heterogeneous systems is low, they are being incorporated into many different computational environments, from academic research clusters to supercomputing centers

Methods

Results

Conclusion