Abstract

The solution of tridiagonal system of equations using graphic processing units (GPU) is assessed. The parallel-Thomas-algorithm (PTA) is developed and the solution of PTA is compared to two known parallel algorithms, i.e. cyclic-reduction (CR) and parallel-cyclic-reduction (PCR). Lid-driven cavity problem is considered to assess these parallel approaches. This problem is also simulated using the classic Thomas algorithm that runs on a central processing unit (CPU). Runtimes and physical parameters of the mentioned GPU and CPU algorithms are compared. The results show that the speedup of CR, PCR and PTA against the CPU runtime is 4.4x,5.2x and 38.5x, respectively. Furthermore, the effect of coalesced and uncoalesced memory access to GPU global memory is examined for PTA, and a 2x-speedup is achieved for the coalesced memory access. Additionally, the PTA performance in a time dependent problem, the unsteady flow over a square, is assessed and a 9x-speedup is obtained against the CPU.

Highlights

  • In recent years, use of graphic processor units (GPU) as a parallel processor and an accelerator for scientific calculations has been growing up

  • The results show that the speedup of cyclic reduction (CR), parallel cyclic reduction (PCR) and Parallel Thomas Algorithm (PTA) against the central processing unit (CPU) runtime are about 4.4Â, 5.2Â and 38.5Â, respectively

  • To assess the PCR, CR, PTA, and classic Thomas algorithms and compare their runtimes, the steady lid-driven cavity flow and the steady/ unsteady flow over a square cylinder inside a channel at various Reynolds numbers are considered

Read more

Summary

Introduction

Use of graphic processor units (GPU) as a parallel processor and an accelerator for scientific calculations has been growing up. Kim et al [8] proposed to break down a problem (having large tridiagonal system of equations) into multiple subproblems each of which is independent of other and solve the smaller systems using tiled PCR and thread-level parallel Thomas algorithm (p-Thomas). With this strategy, 8x through 49x speedups were reported. In 2014 Giles et al [11] discussed the implementation of one-factor and three-factor PDE models on GPUs for both explicit and implicit time-marching methods They introduced a nonstandard hybrid Thomas/PCR algorithm for solving the tridiagonal systems for the implicit solver. To assess PTA performance in time dependent problems, the unsteady flow over a square is solved and a speedup around 9x is obtained

Flow model and the governing equations
Parallel processing on GPUs
Tridiagonal matrix solvers
Thomas algorithm
Cyclic reduction algorithm
Parallel cyclic reduction algorithm
Results and discussions
Steady lid-driven cavity flow
Flow past a square cylinder inside a channel
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.