Abstract

Achieving high performance with grand challenge applications on today's large-scale parallel systems requires tailoring applications for the characteristics of the modern microprocessor architectures. As part of the US Department of Energy's Scientific Discovery through Advanced Computing (SciDAC) program, we studied and tuned the Gyrokinetic Toroidal Code (GTC), a particle-in-cell code for simulating turbulent transport of particles and energy in burning plasma, developed at Princeton Plasma Physics Laboratory. In this paper, we present a performance study of the application that revealed several opportunities for improving performance by enhancing its data locality. We tuned GTC by performing three kinds of transformations: static data structure reorganization to improve spatial locality, loop nest restructuring for better temporal locality, and dynamic data reordering at run-time to enhance both spatial and temporal reuse. Experimental results show that these changes improve execution time by more than 20% on large parallel systems, including a Cray XT4.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call