Optimizing Remote Communication in X10

Arun Thangamani,V Krishna Nandivada

doi:10.1145/3345558

Abstract

X10 is a partitioned global address space programming language that supports the notion of places ; a place consists of some data and some lightweight tasks called activities. Each activity runs at a place and may invoke a place-change operation (using the at-construct) to synchronously perform some computation at another place. These place-change operations can be very expensive, as they need to copy all the required data from the current place to the remote place. However, identifying the necessary number of place-change operations and the required data during each place-change operation are non-trivial tasks, especially in the context of irregular applications (like graph applications) that contain complex code with large amounts of cross-referencing objects—not all of those objects may be actually required, at the remote place. In this article, we present AT-Com, a scheme to optimize X10 code with place-change operations. AT-Com consists of two inter-related new optimizations: (i) AT-Opt, which minimizes the amount of data serialized and communicated during place-change operations, and (ii) AT-Pruning, which identifies/elides redundant place-change operations and does parallel execution of place-change operations. AT-Opt uses a novel abstraction, called abstract-place-tree , to capture place-change operations in the program. For each place-change operation, AT-Opt uses a novel inter-procedural analysis to precisely identify the data required at the remote place in terms of the variables in the current scope. AT-Opt then emits the appropriate code to copy the identified data-items to the remote place. AT-Pruning introduces a set of program transformation techniques to emit optimized code such that it avoids the redundant place-change operations. We have implemented AT-Com in the x10v2.6.0 compiler and tested it over the IMSuite benchmark kernels. Compared to the current X10 compiler, the AT-Com optimized code achieved a geometric mean speedup of 18.72× and 17.83× on a four-node (32 cores per node) Intel and two-node (16 cores per node) AMD system, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimizing Remote Communication in X10

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization

Lead the way for us

Similar Papers

Optimizing remote data transfers in X10
Arun Thangamani ... V Krishna Nandivada
-
Arun Thangamani, et. al.Arun Thangamani ... V Krishna Nandivada
01 Nov 2018
01 Nov 2018

HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems
Weikuan Yu ... Jeffrey S Vetter
Journal of Parallel and Distributed Computing | VOL. 72
Weikuan Yu, et. al.Weikuan Yu ... Jeffrey S Vetter
06 Feb 2012
Journal of Parallel and Distributed Computing | VOL. 72

An evaluation of global address space languages
Cristian Coarfa ... Tarek El-Ghazawi
-
Cristian Coarfa, et. al.Cristian Coarfa ... Tarek El-Ghazawi
15 Jun 2005
15 Jun 2005

The films of Ciro Guerra and the making of cosmopolitan spaces in Colombian cinema
Maria Luna ... Philippe Meers
Alphaville: Journal of Film and Screen Media | VOL. -
Maria Luna, et. al.Maria Luna ... Philippe Meers
24 Jan 2018
Alphaville: Journal of Film and Screen Media | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Remote Communication in X10

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization