Abstract

In general, race condition can be resolved by introducing synchronisations or breaking data dependencies. Atomic operations and graph coloring are the two typical approaches to avoid race condition. Graph coloring algorithms have been generally considered winning algorithms in the literature due to their lock free implementations. In this paper, we present the GPU-accelerated algorithms of the unstructured cell-centered finite volume Computational Fluid Dynamics (CFD) software framework named PHengLEI which was originally developed for aerodynamics applications with arbitrary hybrid meshes. Overall, the newly developed GPU framework demonstrate up to 4.8 speedup comparing with 18 MPI tasks run on the latest Intel CPU node. Furthermore, the enormous efforts have been invested to optimize data dependencies which could lead to race condition due to unstructured mesh indirect addressing and related reduction math operations. With careful comparison between our optimised graph coloring and atomic operations using a series of numerical tests with different mesh sizes, the results show that atomic operations are more efficient than our optimised graph coloring in all of the test cases on Nvidia Tesla GPU V100. Specifically, for the summation operation, using atomicAdd is twice as fast as graph coloring. For the maximum operation, a speedup of 1.5 to 2 is found for atomicMax vs. graph coloring.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call