To further improve the efficiency of the unified gas-kinetic wave-particle (UGKWP) method in hypersonic rarefied non-equilibrium flows, particularly the particle simulation process, we presented the first application of the three-dimensional UGKWP method to multiple graphics processing unit (GPU) devices in this study. The wave and particle evolution components of the method are addressed using cell and particle paralleling strategies, respectively, enabling the primary loop of the GPU-based UPKWP (GPU-UGKWP) to be executed entirely by the compute unified device architecture threads on GPU devices. Concurrently, communication issues between central processing unit (CPU) nodes are resolved by employing the message passing interface model. Additionally, we introduce a tailored memory management scheme for the GPU-UGKWP method, facilitating efficient access to the particle array. Performance comparisons reveal that, relative to a single Intel Xeon Gold 6148 CPU core, the Nvidia Tesla P100 achieves a total speedup of 34 using one GPU device, and 226 with eight GPU devices, and a single Nvidia Titan V GPU device attains a speedup of 62. The speedup outcomes on multiple CPU cores and GPU devices demonstrate that the GPU-based algorithm is better suited for computationally demanding tasks, particularly in particle-dominated simulations. As evidenced by the reduced calculation time for a hypersonic technology vehicle simulation performed on the P100 cluster, GPU devices significantly outperform their CPU counterparts.
Read full abstract