Abstract

In this paper, the performance of the Cyclic Reduction (CR) algorithm for solving tridiagonal systems is improved with the aid of efficient global memory transactions on Graphics Processing Units (GPU). To achieve maximum memory throughput with a lower computational runtime, two different Sort algorithms are introduced for reordering the initial system of equations: direct and step-by-step. It is shown that the latter method is well-fitted to modern GPUs and achieves speedup of up to 3.47× in single precision and 2.1× in double precision compared to the CPU Thomas algorithm. By benefiting from the new global memory implementation, the CR solver could run 2×–100× faster compared to previous works on parallel tridiagonal solvers. The CR solver is also applied to 2D & 3D compressible viscous flow simulations using the high-order compact finite-difference scheme. In this matter, the procedure of filtering, primitive variables, and flux derivative calculations are carried out by using the parallel tridiagonal solver on the GPU device. The GPU-accelerated calculations achieve speedups between 1.9×–15.2× in 2D and 6.4×–20.3× in 3D simulations for different grid sizes compared to CPU computations. The computations are performed on the NVIDIA GTX480 GPU. The obtained results are compared to those achieved on a single core of Intel Core 2 Duo (2.7GHz, 2MB cache) in terms of calculation runtime.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call