Abstract

The Cray Gemini Interconnect has been recently introduced as a next generation network architecture for building multi-petaflop supercomputers. Cray XE6 systems including LANL Cielo, NERSC Hopper, and the proposed NCSA Blue-Waters, as well as the Cray XK6 ORNL Titan leverage the Gemini Interconnect as their primary Interconnection network. At the same time, programming models such as the Message Passing Interface (MPI) and Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) and Co-Array Fortran (CAF) have become available on these systems. Global Arrays is a popular PGAS model used in a variety of application domains including hydrodynamics, chemistry and visualization. Global Arrays uses Aggregate Remote Memory Copy Interface (ARMCI) as the communication runtime system for Remote Memory Access (RMA) communication. This paper presents a design, implementation and performance evaluation of scalable and high performance communication ARMCI on Cray Gemini. The design space is explored and time-space complexities of communication protocols for one-sided communication primitives such as contiguous and uniformly non-contiguous datatypes, atomic memory operations (AMOs) and memory synchronization is presented. An implementation of the proposed design (referred as ARMCI-Gemini) demonstrates the efficacy on communication primitives, application kernels such as LU decomposition and applications such as Smooth Particle Hydrodynamics (SPH).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call