Abstract

In this study, a high-order compact finite difference scheme for the solution of fluid flow problems is implemented to run on a Graphical Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). Besides the compact scheme, a high-order low pass filter is also employed. For time integration, the classical fourth-order Runge–Kutta method is used. Advection of a vortical disturbance and a temporal mixing layer, two basic flows, are chosen for the application of this numerical method on a Tesla C1060, one of NVIDIA’s scientific computing GPUs. Obtained results are compared with those obtained on a single core CPU (AMD Phenom 2.5 GHz) in terms of calculation time. The CPU code exploits LAPACK/BLAS library to solve cyclic tridiagonal systems generated by the compact solution and filtering schemes, whereas the GPU code uses the inverse of the coefficient matrix to solve the same linear systems by utilizing the CUBLAS library. Moreover, the shared memory feature of the GPU is also employed to ease coalescing issues on some parts of the GPU code. Speedups between 9x–16.5x are achieved for different mesh sizes in comparison to CPU computations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call