Abstract

Because of technology and capital cost limitations, supercomputer systems are becoming increasingly complex. These systems provide the expected compute capability at the cost of deeper memory hierarchies, heterogeneous compute elements, and heterogeneous memories. Users of these systems need to determine the mapping of MPI tasks, OpenMP/POSIX threads, and OpenMP/CUDA kernels to the underlying hardware resources. Not only this can be challenging but when executing the same application on a different system, the mapping will likely change to attain reasonable performance. This work presents a memory-centric algorithm to map a parallel hybrid application to the underlying hardware resources transparently, efficiently, and portably from an application's point of view. There are two fundamental aspects of this algorithm. First, unlike existing mappings, its primary design point is the memory system. Compute elements are selected based on the identified memory components and not viceversa. Second, it embodies a global awareness of hybrid programming abstractions as well as heterogeneous devices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call