Оптимизация расчётов в пакете OpenFOAM на GPU

A Monakov

doi:10.15514/ispras-2012-22-14

Abstract

The paper presents preliminary research on improving performance of CFD simulations in OpenFOAM via offloading parts of computations (specifically, solution of linear systems) to a graphics accelerator (GPU). We present a short review of OpenFOAM package and describe porting conjugate gradient method to the GPU architecture using CUDA programming model. Porting the basic algorithm is straightforward, however care should be taken to avoid unnecessary copying over PCI-Express bus. Efficient preconditioning on the GPU is then discussed. We use approximate inverse preconditioning, which can be implemented with good parallelism on the GPU. To amortize the cost of preparing the preconditioner, we allow reuse of preconditioners on the GPU and compute them on the CPU in a helper thread asynchronously. We mention several optimization opportunities: reordering the preconditioner to upper-left triangular form so that CUDA blocks multiplying by denser parts of preconditiner factors are scheduled first; using single-precision storage for the preconditioner to save memory bandwidth; reordering the mesh with nested dissection method from Metis library and using mixed-precision iteration for the conjugate gradient method. Preliminary performance testing results show performance improvement starting from 64000-cell meshes and reaching 2x for a 1-million cell mesh for a non-parallel run. As future work we mention support for parallel runs with MPI, research of other solvers such as multigrid, BiCGStab and IDR, and choosing drop tolerance automatically for the AINV preconditioner.

Highlights

We present a short review of OpenFOAM package and describe porting conjugate gradient method to the GPU architecture using CUDA programming model
We mention several optimization opportunities: reordering the preconditioner to upper-left triangular form so that CUDA blocks multiplying by denser parts of preconditiner factors are scheduled first; using singleprecision storage for the preconditioner to save memory bandwidth; reordering the mesh with nested dissection method from Metis library and using mixed-precision iteration for the conjugate gradient method
Preliminary performance testing results show performance improvement starting from 64000-cell meshes and reaching 2x for a 1-million cell mesh for a non-parallel run

Summary

Введение

OpenFOAM — пакет библиотек и программ для организации научных расчётов преимущественно в области вычислительной гидродинамики и механики сплошных сред [1]. OpenFOAM содержит лишь небольшое число методов решения СЛАУ. В то же время, для больших задач время решения этих систем может покрывать существенную долю общего времени расчёта. OpenFOAM содержит реализации следующих методов решения СЛАУ: Метод сопряжённых градиентов, с предобуславливанием (PCG). Этот алгоритм применим для систем с симметричной матрицей (в контексте OpenFOAM это обычно системы для давления). Все приведённые методы решения являются не прямыми (как, например, метод Гаусса для решения плотных систем), а итерационными: при решении системы Ax = b они начинают с некоторого начального приближения x0 и циклически строят новые решения x1, x2, ..., которые постепенно приближаются к точному решению. Что использование акселератора с отдельной памятью не может существенно ускорить решение этих систем, так как возникнут накладные расходы, связанные с копированием матрицы системы и векторов в память акселератора. В частности, для задач, решаемых на ортогональных трёхмерных сетках, большинство строк системы содержат семь ненулевых элементов

Предобуславливание

Постановка задачи

Реализация метода сопряжённых градиентов на GPU

Вычисление предобуславливателя

Применение предобуславливателя

Переупорядочивание расчётной сетки

Вычисления со смешанной точностью на GPU

Исследование ускорения в зависимости от размера сетки

Основные результаты

10. Заключение

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the Institute for System Programming of the RAS	Publication Date: Jan 1, 2012
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Оптимизация расчётов в пакете OpenFOAM на GPU

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS

Lead the way for us

Similar Papers

Properties of generalized conjugate gradient methods
Rüdiger Weiss
Numerical Linear Algebra with Applications | VOL. 1
Rüdiger WeissRüdiger Weiss
01 Jan 1993
Numerical Linear Algebra with Applications | VOL. 1

A parallel implementation of chebyshev preconditioned conjugate gradient method
C Akcadogan ... H Dag
-
C Akcadogan, et. al.C Akcadogan ... H Dag
13 Oct 2003
13 Oct 2003

An approximate inverse preconditioner and its implementation for conjugate gradient method
Hasan Dağ
Parallel computing | VOL. 33
Hasan DağHasan Dağ
22 Jan 2007
Parallel computing | VOL. 33

Sparse approximate inverse preconditioners on high performance GPU platforms
Daniele Bertaccini ... Salvatore Filippone
Computers & mathematics with applications (Oxford, England : 1987) | VOL. 71
Daniele Bertaccini, et. al.Daniele Bertaccini ... Salvatore Filippone
28 Jan 2016
Computers & mathematics with applications (Oxford, England : 1987) | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Оптимизация расчётов в пакете OpenFOAM на GPU

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS