Abstract

In this paper we study a parallel form of the SOR method for the numerical solution of the Convection Diffusion equation suitable for GPUs using CUDA. To exploit the parallelism offered by GPUs we consider the fine grain parallelism model. This is achieved by considering the local relaxation version of SOR. More specifically, we use SOR with red-black ordering using two sets of parameters @w1ij and @w2ij for the 5 point stencil. The parameter @w1ij is associated with each red (i+j even) grid point (i,j), whereas the parameter @w2ij is associated with each black (i+j odd) grid point (i,j). The use of a parameter for each grid point avoids the global communication required in the adaptive determination of the best value of @w and also increases the convergence rate of the SOR method (Varga, 1962) [38] and (Young, 1971) [41]. We present our strategy and the results of our effort to exploit the computational capabilities of GPUs under the CUDA environment. Additionally, two parallel CPU programs utilizing manual SSE2 (Streaming SIMD Extensions 2) and AVX (Advanced Vector Extensions) vectorization were developed as performance references. The optimizations applied on the GPU version were also considered for the CPU version. Significant performance improvement was achieved with all three developed GPU kernels differentiated by the degree of recomputations thus affecting the flops per element access ratio.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.