Abstract

We report on the successful completion of a 2 trillion particle cosmological simulation to z=0 run on the Piz Daint supercomputer (CSCS, Switzerland), using 4000+ GPU nodes for a little less than 80 h of wall-clock time or 350,000 node hours. Using multiple benchmarks and performance measurements on the US Oak Ridge National Laboratory Titan supercomputer, we demonstrate that our code PKDGRAV3, delivers, to our knowledge, the fastest time-to-solution for large-scale cosmological N-body simulations. This was made possible by using the Fast Multipole Method in conjunction with individual and adaptive particle time steps, both deployed efficiently (and for the first time) on supercomputers with GPU-accelerated nodes. The very low memory footprint of PKDGRAV3 allowed us to run the first ever benchmark with 8 trillion particles on Titan, and to achieve perfect scaling up to 18,000 nodes and a peak performance of 10 Pflops.

Highlights

  • We report on the successful completion of a 2 trillion particle cosmological simulation to z = 0 run on the Piz Daint supercomputer (CSCS, Switzerland), using 4000+ GPU nodes for a little less than 80 h of wall-clock time or 350,000 node hours

  • Because of the non-linear nature of gravity on these scales, our best theoretical predictions make use of N body simulations: the dark matter fluid is sampled in phase space using as many macro-particles as possible, each one representing a large ensemble of true, microscopic dark matter particles, evolving without collision under the effect of their mutual gravitational attraction

  • We report on the successful evolution of a trillion particles simulation of the LCDM model from z = to z = in less than h of wall clock time including on-the-fly analysis, performed on the the Swiss National Supercomputing Center Machine, Piz Daint, using + GPU-accelerated nodes

Read more

Summary

Introduction

We report on the successful evolution of a trillion particles simulation of the LCDM model from z = to z = in less than h of wall clock time including on-the-fly analysis, performed on the the Swiss National Supercomputing Center Machine, Piz Daint, using + GPU-accelerated nodes (see Figure ). The main innovations presented in this paper are ( ) a highly performing version of the FMM algorithm, with a measured peak performance of Pflops, and ( ) an optimal use of the available memory, allowing us to reach trillion particles on the , nodes of the Titan supercomputer.

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.