Abstract

Scalable parallel algorithm for particle transport is one of the main application fields in high-performance computing. Discrete ordinate method (Sn) is one of the most popular deterministic numerical methods for solving particle transport equations. In this paper, we introduce a new method of large-scale heterogeneous computing of one energy group time-independent deterministic discrete ordinates neutron transport in 3D Cartesian geometry (Sweep3D) on Tianhe-2A supercomputer. In heterogeneous programming, we use customized Basic Communication Library (BCL) and Accelerated Computing Library (ACL) to control and communicate between CPU and the Matrix2000 accelerator. We use OpenMP instructions to exploit the parallelism of threads based on Matrix 2000. The test results show that the optimization of applying OpenMP on particle transport algorithm modified by our method can get 11.3 times acceleration at most. On Tianhe-2A supercomputer, the parallel efficiency of 1.01 million cores compared with 170 thousand cores is 52%.

Highlights

  • Particle transport plays an important role in modeling many physical phenomena and engineering problems

  • Compared with original Sweep3D program, this method develops OpenMP thread-level parallelism and implements heterogeneous computing functions based on the Basic Communication Library (BCL) and the Accelerated Computing Library (ACL), which are highly customized for Tianhe-2A

  • Invoke ACL to start the accelerator Matrix2000 Establish the connection between CPU and Matrix2000 Invoke BCL to transport initialized data to Matrix2000 /* the Source Iteration (SI) running on Matrix 2000 */ #pragma omp parallel for { Calculate source//Matrix2000 } /* Wavefront sweeping in algorithm 1 */ for iq 1 to 8 do for mo 1 to mmo do for kk 1 to kb do MPI recv east/west block I-inflows//CPU rank_id MPI recv south/north block J-inflows//CPU rank_id Invoke BCL to recv the block I-inflows from slave process//Matrix2000 Invoke BCL to recv the block J-inflows from slave process//Matrix2000 #pragma omp parallel for { Calculate discrete source in Pn moments//

Read more

Summary

INTRODUCTION

Particle transport plays an important role in modeling many physical phenomena and engineering problems. Gong et al (2011) and Gong et al (2012) designed a large-scale heterogeneous parallel algorithm based on GPU by mining fine-grained threadlevel parallelism of particle transport problems, which breaked the limitations of the particle simulation and took full advantage of GPU architecture. Wang et al (2015) designed Sweep3D with thread-level parallelism and vectorization acceleration, and ported Sweep3D to the MIC many-core coprocessors, applied the Roofline model to access the absolute performance of the optimizations. Based on Sweep3D, we design and develop the method of large-scale heterogeneous computing for 3D deterministic particle transport on Tianhe-2A supercomputer. Compared with original Sweep3D program, this method develops OpenMP thread-level parallelism and implements heterogeneous computing functions based on the Basic Communication Library (BCL) and the Accelerated Computing Library (ACL), which are highly customized for Tianhe-2A

Sweep3D
Matrix2000 Accelerator
Heterogeneous Parallel Algorithms
OpenMP Thread Level Parallelism
Flux Fixup
EXPERIMENT AND RESULTS
OpenMP Performance Optimization Test
Large-Scale Extension Test on Tianhe-2A Supercomputer
CONCLUSION AND FUTURE WORK
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call