Three-dimensional Domain Decomposition Research Articles

A new dual-communicator algorithm with very favorable performance characteristics has been developed for direct numerical simulation (DNS) of turbulent mixing of a passive scalar governed by an advection–diffusion equation. We focus on the regime of high Schmidt number (Sc), where because of low molecular diffusivity the grid-resolution requirements for the scalar field are stricter than those for the velocity field by a factor Sc. Computational throughput is improved by simulating the velocity field on a coarse grid of Nv3 points with a Fourier pseudo-spectral (FPS) method, while the passive scalar is simulated on a fine grid of Nθ3 points with a combined compact finite difference (CCD) scheme which computes first and second derivatives at eighth-order accuracy. A static three-dimensional domain decomposition and a parallel solution algorithm for the CCD scheme are used to avoid the heavy communication cost of memory transposes. A kernel is used to evaluate several approaches to optimize the performance of the CCD routines, which account for 60% of the overall simulation cost. On the petascale supercomputer Blue Waters at the University of Illinois, Urbana–Champaign, scalability is improved substantially with a hybrid MPI-OpenMP approach in which a dedicated thread per NUMA domain overlaps communication calls with computational tasks performed by a separate team of threads spawned using OpenMP nested parallelism. At a target production problem size of 81923 (0.5 trillion) grid points on 262,144 cores, CCD timings are reduced by 34% compared to a pure-MPI implementation. Timings for 163843 (4 trillion) grid points on 524,288 cores encouragingly maintain scalability greater than 90%, although the wall clock time is too high for production runs at this size. Performance monitoring with CrayPat for problem sizes up to 40963 shows that the CCD routines can achieve nearly 6% of the peak flop rate. The new DNS code is built upon two existing FPS and CCD codes. With the grid ratio Nθ∕Nv=8, the disparity in the computational requirements for the velocity and scalar problems is addressed by splitting the global communicator MPI_COMM_WORLD into disjoint communicators for the velocity and scalar fields, respectively. Inter-communicator transfer of the velocity field from the velocity communicator to the scalar communicator is handled with discrete send and non-blocking receive calls, which are overlapped with other operations on the scalar communicator. For production simulations at Nθ=8192 and Nv=1024 on 262,144 cores for the scalar field, the DNS code achieves 94% strong scaling relative to 65,536 cores and 92% weak scaling relative to Nθ=1024 and Nv=128 on 512 cores.

Read full abstract

With tens of petaflops supercomputers already in operation and exaflops machines expected to appear within the next 10 years, efficient parallel computational methods are required to take advantage of such extreme-scale machines. In this paper, we present a three-dimensional domain decomposition scheme for enabling large-scale electronic structure calculations based on density functional theory (DFT) on massively parallel computers. It is composed of two methods: (i) the atom decomposition method and (ii) the grid decomposition method. In the former method, we develop a modified recursive bisection method based on the moment of inertia tensor to reorder the atoms along a principal axis so that atoms that are close in real space are also close on the axis to ensure data locality. The atoms are then divided into sub-domains depending on their projections onto the principal axis in a balanced way among the processes. In the latter method, we define four data structures for the partitioning of grid points that are carefully constructed to make data locality consistent with that of the clustered atoms for minimizing data communications between the processes. We also propose a decomposition method for solving the Poisson equation using the three-dimensional FFT in Hartree potential calculation, which is shown to be better in terms of communication efficiency than a previously proposed parallelization method based on a two-dimensional decomposition. For evaluation, we perform benchmark calculations with our open-source DFT code, OpenMX, paying particular attention to the O(N) Krylov subspace method. The results show that our scheme exhibits good strong and weak scaling properties, with the parallel efficiency at 131,072 cores being 67.7% compared to the baseline of 16,384 cores with 131,072 atoms of the diamond structure on the K computer.

Read full abstract

Three-dimensional Domain Decomposition Research Articles

Related Topics

Articles published on Three-dimensional Domain Decomposition

Efficient parallel strategy for molecular plasmonics – A numerical tool for integrating Maxwell-Schrödinger equations in three dimensions

An upright bottomless vertical cylinder with baffles floating in waves

Motion of a floating body in a harbour by domain decomposition method

A dual communicator and dual grid-resolution algorithm for petascale simulations of turbulent mixing at high Schmidt number

Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

Direct pore-to-core up-scaling of displacement processes: Dynamic pore network modeling and experimentation

A three-dimensional domain decomposition method for large-scale DFT electronic structure calculations

Performance measurement of magnetohydrodynamic code for space plasma on typical scalar-type supercomputer systems with a large number of cores

Accelerating large-scale simulation of seismic wave propagation by multi-GPUs and three-dimensional domain decomposition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Three-dimensional Domain Decomposition Research Articles

Related Topics

Articles published on Three-dimensional Domain Decomposition

Efficient parallel strategy for molecular plasmonics – A numerical tool for integrating Maxwell-Schrödinger equations in three dimensions

An upright bottomless vertical cylinder with baffles floating in waves

Motion of a floating body in a harbour by domain decomposition method

A dual communicator and dual grid-resolution algorithm for petascale simulations of turbulent mixing at high Schmidt number

Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

Direct pore-to-core up-scaling of displacement processes: Dynamic pore network modeling and experimentation

A three-dimensional domain decomposition method for large-scale DFT electronic structure calculations

Performance measurement of magnetohydrodynamic code for space plasma on typical scalar-type supercomputer systems with a large number of cores

Accelerating large-scale simulation of seismic wave propagation by multi-GPUs and three-dimensional domain decomposition