Optimizing communication for Charm++ applications by reducing network contention

Abhinav Bhatelé,Laxmikant V Kalé,Eric Bohm

doi:10.1002/cpe.1637

Abstract

AbstractOptimal network performance is critical for efficient parallel scaling of communication‐bound applications on large machines. No‐load latencies do not increase significantly with the number of hops traveled when wormhole routing is deployed. Yet, we and others have recently shown that in the presence of contention, message latencies can grow substantially large. Hence, task mapping strategies should take the topology of the machine into account on large machines. In this paper, we present topology aware mapping as a technique to optimize communication on three‐dimensional mesh interconnects and hence improve the performance. Our methodology is facilitated by the idea of object‐based decomposition used in Charm++ which separates the processes of decomposition from mapping of computation to processors and allows a more flexible mapping based on communication patterns between objects. Exploiting this and the topology of the allocated job partition, we present mapping strategies for a production code, OpenAtom to improve the overall performance and scaling. OpenAtom presents complex communication scenarios of interaction involving multiple groups of objects and makes the mapping task a challenge. Results are presented for OpenAtom on up to 16 384 processors of Blue Gene/L, 8192 processors of Blue Gene/P and 2048 processors of Cray XT3. Copyright © 2010 John Wiley & Sons, Ltd.

Full Text