The advancement in programmable capability of graphics hardware has paved new opportunities in the domain of high performance computing (HPC). The computational fluid dynamics (CFD) community, being a significant user of HPC, has started exploiting the inherent data parallelism in the numerical solvers to be able to make efficient use of these many-core, high throughput accelerator based processors. In the present work, we examine the process of accelerating our CPU based Staggered Update Procedure (SUP) solver, i.e., a higher order accurate cell-centred finite volume solver by off-loading the computationally most expensive region of the code pertaining to the explicit residual computation. We have adopted OpenACC, a directive based programming model to expose parallelism in the code. The framework evolved for GPU porting in the context of SUP is also of value to those intending to port their CFD solvers based on classical finite volume methodology. The performance analysis is conducted using scalar convection–diffusion equations in both two- and three-dimensions. The findings demonstrate a speedup factor of 9 (in case of 2D) and 28 (in case of 3D) when considering the explicit residual alone, achieved with a single NVIDIA Tesla V100 GPU card. In addition, we could establish superior algorithmic scalability by the way of recovering near perfect serial performance, on the heterogeneous CPU+GPU architecture. Further, overall code acceleration can be achieved by porting other parts of the solver on GPU.
Read full abstract