Abstract

High performance computing with low cost machines becomes a reality with GPU. Unfortunately, high performances are achieved when the programmer exploits the architectural specificities of the GPU processors: he has to focus on inter-GPU communications, task allocations among the GPUs, task scheduling, external memory prefetching, and synchronization. In this paper, we propose and evaluate a compile flow. It automates the transformation of a program expressed with the high level system design language SystemC, to its implementation on a cluster of multi-GPU. SystemC constructs and scheduler are directly mapped to the GPU API, preserving their semantic. Inter-GPU communications are abstracted by means of SystemC channels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call