Distributed Memory Platforms Research Articles

Reducing the need for users to manually manage the details of work and data distribution is an important goal of high-level many-task runtime systems. For distributed memory platforms this means that the runtime system has to keep track of both fine-grained task dependencies and data residency meta-information. The amount of such meta-information is proportional to the granularity of parallelism which needs to be managed, introducing a trade-off. More precise tracking of data state allows leveraging more opportunities for compute and transfer parallelism, while also introducing more overhead. As such, the fidelity of the information being tracked needs to be managed carefully, ideally without introducing additional latency, communication or substantial compute overhead. We present the “Horizons” approach, designed to fulfill these goals. Specifically, horizons allow for the effective and efficient management of parallelism and the coalescing of previous fine-grained tracking information while maintaining an easily configurable scheduling window with full information precision. As an additional benefit, they provide consistent cluster-wide decision points without requiring any inter-node communication, and effectively cap the size of state tracking data structures even in the presence of problematic access patterns. Experimental evaluation on microbenchmarks and dry runs demonstrates that horizons are effective in keeping the scheduling complexity constant, while their own overhead is negligible—below 10μs\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$10\\, \\upmu {\\rm s}$$\\end{document} per horizon when building a command graph for 512 GPUs. We additionally demonstrate the performance impact of horizons—as well as their low overhead—on a real-world application.

Read full abstract

This work is a user guide to the FEMPAR scientific software library. FEMPAR is an open-source object-oriented framework for the simulation of partial differential equations (PDEs) using finite element methods on distributed-memory platforms. It provides a rich set of tools for numerical discretization and built-in scalable solvers for the resulting linear systems of equations. An application expert that wants to simulate a PDE-governed problem has to extend the framework with a description of the weak form of the PDE at hand (and additional perturbation terms for non-conforming approximations). We show how to use the library by going through three different tutorials. The first tutorial simulates a linear PDE (Poisson equation) in a serial environment for a structured mesh using both continuous and discontinuous Galerkin finite element methods. The second tutorial extends it with adaptive mesh refinement on octree meshes. The third tutorial is a distributed-memory version of the previous one that combines a scalable octree handler and a scalable domain decomposition solver. The exposition is restricted to linear PDEs and simple geometries to keep it concise. The interested user can dive into more tutorials available in the FEMPAR public repository to learn about further capabilities of the library, e.g., nonlinear PDEs and nonlinear solvers, time integration, multi-field PDEs, block preconditioning, or unstructured mesh handling. Program summaryProgram Title:FEMPARProgram Files doi:http://dx.doi.org/10.17632/dtx487wp57.1Licensing provisions: GNU General Public License 3Programming language: MPI, Fortran2003/2008 (Object-Oriented Programming features)Nature of problem: Computational simulation of a broad range of large-scale application problems governed by Partial Differential EquationsSolution method: Arbitrary-order grad-, curl-, and div-conforming finite elements on n-cube and n-simplex meshes. Continuous and Discontinuous Galerkin FEM. Adaptive Mesh Refinement and Coarsening via forests-of-octrees. Diagonally Implicit Runge–Kutta time integrators. Newton–Raphson linearization. Block preconditioning for multiphysics applications. Multilevel Balancing Domain Decomposition by Constraints preconditioning. Krylov subspace iterative solvers. Sparse direct solvers.Additional comments: Program Github repository https://github.com/fempar/fempar Program website http://www.fempar.org

Read full abstract

Distributed Memory Platforms Research Articles

Related Topics

Articles published on Distributed Memory Platforms

Runtime support for CPU-GPU high-performance computing on distributed memory platforms

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Topology-free immersed boundary method for incompressible turbulence flows: An aerodynamic simulation for “dirty” CAD geometry

Efficient and Accurate Electromagnetic Angular Sweeping of Rough Surfaces by MPI Parallel Randomized Low-Rank Decomposition

A tutorial-driven introduction to the parallel finite element library FEMPAR v1.0.0

Distributed-memory parallelization of the aggregated unfitted finite element method

Managing Pending Events in Sequential and Parallel Simulations Using Three-tier Heap and Two-tier Ladder Queue

Efficient Subpopulation Based Parallel TLBO Optimization Algorithms

A highly scalable parallel encoder version of the emergent JEM video encoder

Efficient distributed memory management with RDMA and caching

A randomized least squares solver for terabyte-sized dense overdetermined systems

A matrix-algebraic formulation of distributed-memory maximal cardinality matching algorithms in bipartite graphs

Optimization of the Processing of Data Streams on Roughly Characterized Distributed Resources

Distributed memory parallel approaches for HEVC encoder

A COMPLEX MIX-SHIFTED PARALLEL QR ALGORITHM FOR THE C-METHOD

Complex Network Partitioning Using Label Propagation

Topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms

Session details: Special Issue on Big Spatial Data (Part 2)

Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q

Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Distributed Memory Platforms Research Articles

Related Topics

Articles published on Distributed Memory Platforms

Runtime support for CPU-GPU high-performance computing on distributed memory platforms

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Topology-free immersed boundary method for incompressible turbulence flows: An aerodynamic simulation for “dirty” CAD geometry

Efficient and Accurate Electromagnetic Angular Sweeping of Rough Surfaces by MPI Parallel Randomized Low-Rank Decomposition

A tutorial-driven introduction to the parallel finite element library FEMPAR v1.0.0

Distributed-memory parallelization of the aggregated unfitted finite element method

Managing Pending Events in Sequential and Parallel Simulations Using Three-tier Heap and Two-tier Ladder Queue

Efficient Subpopulation Based Parallel TLBO Optimization Algorithms

A highly scalable parallel encoder version of the emergent JEM video encoder

Efficient distributed memory management with RDMA and caching

A randomized least squares solver for terabyte-sized dense overdetermined systems

A matrix-algebraic formulation of distributed-memory maximal cardinality matching algorithms in bipartite graphs

Optimization of the Processing of Data Streams on Roughly Characterized Distributed Resources

Distributed memory parallel approaches for HEVC encoder

A COMPLEX MIX-SHIFTED PARALLEL QR ALGORITHM FOR THE C-METHOD

Complex Network Partitioning Using Label Propagation

Topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms

Session details: Special Issue on Big Spatial Data (Part 2)

Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q

Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations