POSTER

Shigang Li,Torsten Hoefler,Yunquan Zhang

doi:10.1145/3155284.3019025

POSTER

Shigang Li, Torsten Hoefler + Show 1 more

https://doi.org/10.1145/3155284.3019025

Copy DOI

Journal: ACM SIGPLAN Notices	Publication Date: Jan 26, 2017
Citations: 3

Affiliation: Institute of Computing Technology, Chinese Academy of Sciences, ETH Zurich

#Many-core Era #Experimental Results + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In the many-core era, the performance of MPI collectives is more dependent on the intra-node communication component. However, the communication algorithms generally inherit from the inter-node version and ignore the cache complexity. We propose cache-oblivious algorithms for MPI all-to-all operations, in which data blocks are copied into the receive buffers in Morton order to exploit data locality. Experimental results on different many-core architectures show that our cache-oblivious implementations significantly outperform the naive implementations based on shared heap and the highly optimized MPI libraries.

Full Text