Linear Algebra Routines Research Articles

Processors with 100s of threads of execution are among the state-of-the-art in high-end computing systems. This transition to many-core computing has required the community to develop new algorithms to overcome significant latency bottlenecks through massive concurrency. However, implementing efficient parallel runtimes that can scale up to high concurrency levels with extremely fine-grained tasks remains a challenge. Existing techniques do not scale to a large number of threads due to the high cost of synchronization in concurrent data structures. We present a thorough analysis of various synchronization mechanisms including mutex, semaphore, spinlock and atomic fetch-and-add that are typically used to build concurrent data structures in task-parallel runtime systems. To overcome these limitations, in a recent work we proposed XQueue, a novel lock-less concurrent queuing system with relaxed ordering semantics that is geared towards realizing scalability up to hundreds of concurrent threads. In this work, we extend XQueue and present X-OpenMP, a library for enabling extremely fine-grained parallelism on modern many-core systems with hundreds of cores. Work stealing is a popular choice for load balancing in task-based runtime systems as it efficiently distributes the load across worker threads; however, traditional approaches rely on synchronization primitives and thus work stealing can incur overheads. Here we implement a lock-less algorithm for work stealing for total-store order (TSO) memory architectures and evaluate the performance using micro and macro benchmarks. We compare the performance of X-OpenMP with native LLVM OpenMP, GNU OpenMP, OpenCilk and oneTBB implementations using task-based linear algebra routines from PLASMA numerical library, Strassen’s matrix multiplication from the BOTS Benchmark Suite, and the Unbalanced Tree Search benchmark. Applications parallelized using OpenMP can run without modification by simply linking against the X-OpenMP library. X-OpenMP achieves up to 40X speedup compared to GNU OpenMP, up to 2X speedup compared to the native LLVM OpenMP, up to 6X speedup compared to OpenCilk and up to 5X speedup compared to oneTBB implementations. The tasking overheads in X-OpenMP are reduced by 50% compared to the native LLVM OpenMP.

Read full abstract

Thermal-FIST11Thermal-FIST — Thermal, Fast and Interactive Statistical Toolkit. is a C++ package designed for convenient general-purpose physics analysis within the family of hadron resonance gas (HRG) models. This mainly includes the statistical analysis of particle production in heavy-ion collisions and the phenomenology of hadronic equation of state. Notable features include fluctuations and correlations of conserved charges, effects of probabilistic decay, chemical non-equilibrium, and inclusion of van der Waals hadronic interactions. Calculations are possible within the grand canonical ensemble, the canonical ensemble, as well as in mixed-canonical ensembles combining the canonical treatment of certain conserved charges with the grand-canonical treatment of other conserved charges. The package contains a fast thermal event generator, which generates particle yields in accordance with the HRG chemistry, and particle momenta based on the Blast Wave model. A distinct feature of this package is the presence of the graphical user interface frontend – QtThermalFIST – which is designed for fast and convenient general-purpose HRG model applications. Program summaryProgram Title:Thermal-FIST, version 1.2Program Files doi:http://dx.doi.org/10.17632/pprr8p4fkp.1Licensing provisions: GPLv3Programming language: C++External routines:Eigen template library for the linear algebra routines [1], MINUIT2 package from CERN ROOT [2], Mersenne Twister random number generator [3], Qt5 framework [4] (for the GUI only), QCustomPlot Qt widget [5] (for the GUI only)Nature of problem: The HRG model and its various modifications constitute a common framework used for modeling of the hadronic equation of state and particle production in heavy-ion collisions. Even the simplest versions of the HRG model require careful considerations of the many details, including the resonance decay feed-down, implementation of charge conservation constraints relevant for heavy-ion collisions, chemical non-equilibrium effects. A notable extra effort is required in order to treat the fluctuations and correlations of various charges , which presently are being extensively studied in the heavy-ion collision experiments and lattice QCD calculations. The inclusion of hadronic interactions, modeled by an excluded-volume (EV) or a van der Waals (vdW) type framework, additionally requires a numerical solution to a system of many transcendental equations.Solution method: The Thermal-FIST package contains a class-based library which calculates relevant HRG observables for a specified setup. The setup includes a particle list, usually to be supplied with an external file, an HRG model specification (statistical ensemble, van der Waals interaction parameters, etc.), a set of thermal parameters, and conservation laws constraints. Whenever necessary, the systems of transcendental equations are solved numerically with the Broyden’s method. The package includes a fitter for extracting thermal parameters from hadron yield data through the χ2 minimization. The HRG model based Monte Carlo event generator is a complementary feature to analytic calculations. A general-purpose thermal analysis is made maximally convenient with QtThermalFIST — a GUI frontend based on the Qt framework where all typical calculations, such as the properties of the equation of state or the thermal fits, can be straightforwardly performed.Additional comments: If the EV/vdW interactions are present, exact analytic calculations are presently only possible within the grand canonical ensemble. Approximate calculations are possible for the strangeness-canonical ensemble on the condition that strange particles form a small subsystem relative to the total system. Effects of probabilistic decays on fluctuation observables are generally included only up to the moments of the 2nd order. The only exception is the ideal HRG model in the grand canonical ensemble, where these effects are included up to the moments of the 4th order. On the other hand, the Monte Carlo event generator is not constrained by the above restrictions.

Read full abstract

Linear Algebra Routines Research Articles

Related Topics

Articles published on Linear Algebra Routines

X-OpenMP — eXtreme fine-grained tasking using lock-less work stealing

BLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans Analizi

Efficient computation of optical excitations in two-dimensional materials with the Xatu code

A quadratic decoder approach to nonintrusive reduced‐order modeling of nonlinear dynamical systems

Proximal Stabilized Interior Point Methods and Low-Frequency-Update Preconditioning Techniques

The triple decomposition of the velocity gradient tensor as a standardized real Schur form

Providing performance portable numerics for Intel GPUs

Discrete Lehmann representation of imaginary time Green's functions

QPALM: a proximal augmented lagrangian method for nonconvex quadratic programs

A fast data-driven method for genotype imputation, phasing and local ancestry inference: MendelImpute.jl.

A Heuristic Independent Particle Approximation to Determinantal Point Processes

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic

Design and implementation of a modular interior-point solver for linear optimization

Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)

Study of Exploiting Coarse‐Grained Parallelism in Block‐Oriented Numerical Linear Algebra Routines

Integrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters

Thermal-FIST: A package for heavy-ion collisions and hadronic equation of state

BLASFEO

Modal Analysis of Fluid Flows: An Overview

Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Linear Algebra Routines Research Articles

Related Topics

Articles published on Linear Algebra Routines

X-OpenMP — eXtreme fine-grained tasking using lock-less work stealing

BLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans Analizi

Efficient computation of optical excitations in two-dimensional materials with the Xatu code

A quadratic decoder approach to nonintrusive reduced‐order modeling of nonlinear dynamical systems

Proximal Stabilized Interior Point Methods and Low-Frequency-Update Preconditioning Techniques

The triple decomposition of the velocity gradient tensor as a standardized real Schur form

Providing performance portable numerics for Intel GPUs

Discrete Lehmann representation of imaginary time Green's functions

QPALM: a proximal augmented lagrangian method for nonconvex quadratic programs

A fast data-driven method for genotype imputation, phasing and local ancestry inference: MendelImpute.jl.

A Heuristic Independent Particle Approximation to Determinantal Point Processes

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic

Design and implementation of a modular interior-point solver for linear optimization

Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)

Study of Exploiting Coarse‐Grained Parallelism in Block‐Oriented Numerical Linear Algebra Routines

Integrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters

Thermal-FIST: A package for heavy-ion collisions and hadronic equation of state

BLASFEO

Modal Analysis of Fluid Flows: An Overview

Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides