Abstract

Peridynamics (PD) methods are good at describing solid mechanical behaviours and have the superiority on simulating the discontinuous problems. They can be applied to many fields, such as materials science, human health, and industrial manufacturing, etc., which motivates us to provide their efficient numerical simulations on the Sunway TaihuLight supercomputer. However, massive and complex calculations of PD simulations and the characteristics of Sunway TaihuLight bring challenges to efficient parallel PD simulations. In this paper, we present a series of performance optimization techniques to perform a large-scale parallel PD simulation application on Sunway TaihuLight. We first design the data grouping and SPM-based caching to increase the bandwidth of data transmission and reduce the time of the main memory access. Further, we design and implement vectorization and instruction-level optimization for PD applications to improve computational performance. Finally, we offer the overlapping strategies of data transmission and computation so that data transmission can be covered by computation. Our work in a core group improves the performance of the serial version on the SW26010 processor by 181 times. Compared to the serial and single-CPU Peridigm-based simulations on Intel Xeon E5-2680 V3, our work gets a speedup of 60 times and 6 times, respectively. Near linear scalability is also obtained. When testing the weak scaling, the simulation of a 296,222,720-point example achieves 1.14 PFLOPS with 8192 (532,480 cores) processes. When testing the strong scaling, 90% parallel efficiency is observed as the number of processes increases 64 times to 4096 processes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call