A comparison of CPU and GPU implementations for solving the Convection Diffusion equation using the local Modified SOR method

Yiannis Cotronis,Elias Konstantinidis,Nikolaos M Missirlis,Maria A Louka

doi:10.1016/j.parco.2014.02.002

Yiannis Cotronis, Elias Konstantinidis + Show 2 more

https://doi.org/10.1016/j.parco.2014.02.002

Copy DOI

Abstract

In this paper we study a parallel form of the SOR method for the numerical solution of the Convection Diffusion equation suitable for GPUs using CUDA. To exploit the parallelism offered by GPUs we consider the fine grain parallelism model. This is achieved by considering the local relaxation version of SOR. More specifically, we use SOR with red-black ordering using two sets of parameters @w1ij and @w2ij for the 5 point stencil. The parameter @w1ij is associated with each red (i+j even) grid point (i,j), whereas the parameter @w2ij is associated with each black (i+j odd) grid point (i,j). The use of a parameter for each grid point avoids the global communication required in the adaptive determination of the best value of @w and also increases the convergence rate of the SOR method (Varga, 1962) [38] and (Young, 1971) [41]. We present our strategy and the results of our effort to exploit the computational capabilities of GPUs under the CUDA environment. Additionally, two parallel CPU programs utilizing manual SSE2 (Streaming SIMD Extensions 2) and AVX (Advanced Vector Extensions) vectorization were developed as performance references. The optimizations applied on the GPU version were also considered for the CPU version. Significant performance improvement was achieved with all three developed GPU kernels differentiated by the degree of recomputations thus affecting the flops per element access ratio.

Full Text