Abstract

This article presents a case study on the extension of parallel algorithms in tsunami and earthquake-cycle simulators for massively parallel execution on the K computer. We use two target applications: a tsunami-simulation program, “JAGURS,” and an earthquake-cycle program, “RSGDX.” Our optimization strategy for collective communication is to split the Message Passing Interface (MPI) communicator and perform multistage localized communication to minimize the communication frequency, transferred data size, and network congestion. Moreover, in the case of severe load imbalances, we apply cyclic distribution and extend the axes for parallelization. For each application, we conduct a performance evaluation with massively parallel execution on the K computer. It is shown that our optimized code enables JAGURS to attain a 21.8× speedup for collective communication and a 7.9× speedup for the time-step loop on 8748 nodes (69,984 cores). RSGDX attains a 4.25× speedup for collective communication and an 18.7× speedup for the time-step loop on 8192 nodes (65,536 cores).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.