In this study, a class of GPU-based corrected explicit-implicit domain decomposition schemes (GCEIDD) is proposed to accelerate the solution of convection-dominated diffusion problems on GPUs. All of these methods take advantage of the fractional steps and corrected explicit-implicit domain decomposition techniques. In each sweep, the domain is decomposed into many strip-divided subdomains. The aim is to reduce the size and increase the number of tri-diagonal systems to provide enough parallelism to keep the GPU occupied. In different steps of the algorithms, we use low-complexity schemes with a high degree of parallelism. The generic form of GCEIDD allows different schemes for the prediction, correction, and one-dimensional implicit solution. We have proposed two methods based on modified upwind and characteristics finite difference and a combined method that bypasses the limitations of these two. Also, important aspects of CUDA programming, which are critical to the implementation of the GCEIDD algorithm, are discussed in detail. For solving tri-diagonal systems, a three-stage strategy is proposed, which creates a balance between occupancy, cache hit rate, and shared memory usage and reduces cache contentions. Results show that GCEIDD has good accuracy and stability even when the number of subdomains is very large. The proposed method can accelerate the solution by a factor of up to 3 compared with the GPU implementation of the classic fractional step methods.
Read full abstract