This paper describes the implementation of efficient and portable vectorized sweep kernels as part of the resolution of the neutron transport equation on three-dimensional Cartesian grids using the discrete ordinates ( S n ) method for the angular variable and the diamond differencing (DD) scheme for the spatial discretization. Vectorization is set up along the directions within the same octant and is independent of the spatial discretization order; therefore, the extension of this technique to high-order DD or discontinuous Galerkin schemes is immediate. Our implementation is written in C++17 and relies on the Kokkos performance portability framework. This library allows one to express shared-memory parallelism (including vectorization) in a machine-independent way and supports many backends including CUDA and OpenMP. Our vectorization procedure relies on the portable single instruction multiple data types provided by Kokkos. The method has been implemented for DD schemes up to order 2 and yields promising results on CPUs supporting standard vector instructions.
Read full abstract