Linear Algebra Programming Research Articles

GraphBLASis a recent standard that allows the expression of graph algorithms in the language of linear algebra and enables automatic code parallelization and optimization. GraphBLAS operations are memory bound and may benefit from data locality optimizations enabled by nonblocking execution. However, nonblocking execution remains under-evaluated. In this article, we present a novel design and implementation that investigates nonblocking execution in GraphBLAS. Lazy evaluation enables runtime optimizations that improve data locality, and dynamic data dependence analysis identifies operations that may reuse data in cache. The nonblocking execution of an arbitrary number of operations results in dynamic parallelism, and the performance of the nonblocking execution depends on two parameters, which are automatically determined, at run-time, based on a proposed analytic model. The evaluation confirms the importance of nonblocking execution for various matrices of three algorithms, by showing up to 4.11× speedup over blocking execution as a result of better cache utilization. The proposed analytic model makes the nonblocking execution reach up to 5.13× speedup over the blocking execution. The fully automatic performance is very close to that obtained by using the best manual configuration for both small and large matrices. Finally, the evaluation includes a comparison with other state-of-the-art frameworks for numerical linear algebra programming that employ parallel execution and similar optimizations to those discussed in this work, and the presented nonblocking execution reaches up to 16.1× speedup over the state of the art.

The Sunway TaihuLight supercomputer has been installed for several years and many applications have been ported or built for TaihuLight. Initially most applications running on TaihuLight are with regular memory access patterns, such as dense linear algebra, structured grids and dynamic programming. At the year of 2018, developers have published a general purpose graph processing framework, a ported version of LAMMPS and a sparse triangular solver. These applications are with irregular memory access patterns which need a lot of special processings to make use of the computing processing elements (CPEs) of TaihuLight. While those strategies are efficient, doing such processing may be difficult for wider range of applications, especially for the constantly changing molecular dynamics applications or dynamic unstructured grids. In this paper, we present our work of designing a general purpose software cache library, SWCache, for simplifying the work of applying software cache in kernels, as well as a series of tools for tuning and modelling the performance of our software cache. After a series of optimizations including reordering branches for better branch prediction, hand-tuning register allocation, we evaluate our implementation in two mini-apps: miniFE and miniMD. Experiments show that our tuned software cache library can be applied in these applications, and can provide 20% speedup in miniMD compared to the strategies in a previous port of LAMMPS. Also, the workload of writing code can be reduced by using our library. Besides, the experience of efficient macro-based programming should be valuable for further application development on CPEs which are lack of C++ support.

Linear Algebra Programming Research Articles

Related Topics

Articles published on Linear Algebra Programming

Design and Implementation for Nonblocking Execution in GraphBLAS: Tradeoffs and Performance

Synthesis of Incremental Linear Algebra Programs

Tuning a general purpose software cache library for TaihuLight’s SW26010 processor

On the Existence of Block-Diagonal Solutions to Lyapunov and ${\mathcal {H}_\infty }$ Riccati Inequalities

Theorems of the alternative revisited

Model Study and Analysis Based on Student Performance Data under SPSS Statistics— Take a College from Hangzhou Normal University as an Example

PolyJIT: Polyhedral Optimization Just in Time

On optimizing operator fusion plans for large-scale machine learning in systemML

Varimax 회전 및 그 이후: R 및 Q 분석을 위한 선형 대수, 시각화 및 Python 프로그래밍을 사용한 PCA에 대한 사용지침서

A structure-preserving pivotal method for affine variational inequalities

Graph Programming Interface (GPI): A Linear Algebra Programming Model for Large Scale Graph Computations

Matching Bills of Materials Using Tree Reconciliation

Asset Pricing, Financial Markets, and Linear Algebra

A LINEAR APPROACH TO LIE TRIPLE AUTOMORPHISMS OF H*-ALGEBRAS

Problem Based Learning (PBL): Analysis of Continuous Stirred Tank Chemical Reactors with a Process Control Approach

Generalized switch-setting problems

Computing photonic band structures by Dirichlet-to-Neumann maps: The triangular lattice

A cost-effective implementation of multilevel tiling

Introducing parallel manipulators through laboratory experiments

Learning missing values from summary constraints

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Linear Algebra Programming Research Articles

Related Topics

Articles published on Linear Algebra Programming

Design and Implementation for Nonblocking Execution in GraphBLAS: Tradeoffs and Performance

Synthesis of Incremental Linear Algebra Programs

Tuning a general purpose software cache library for TaihuLight’s SW26010 processor

On the Existence of Block-Diagonal Solutions to Lyapunov and ${\mathcal {H}_\infty }$ Riccati Inequalities

Theorems of the alternative revisited

Model Study and Analysis Based on Student Performance Data under SPSS Statistics— Take a College from Hangzhou Normal University as an Example

PolyJIT: Polyhedral Optimization Just in Time

On optimizing operator fusion plans for large-scale machine learning in systemML

Varimax 회전 및 그 이후: R 및 Q 분석을 위한 선형 대수, 시각화 및 Python 프로그래밍을 사용한 PCA에 대한 사용지침서

A structure-preserving pivotal method for affine variational inequalities

Graph Programming Interface (GPI): A Linear Algebra Programming Model for Large Scale Graph Computations

Matching Bills of Materials Using Tree Reconciliation

Asset Pricing, Financial Markets, and Linear Algebra

A LINEAR APPROACH TO LIE TRIPLE AUTOMORPHISMS OF H*-ALGEBRAS

Problem Based Learning (PBL): Analysis of Continuous Stirred Tank Chemical Reactors with a Process Control Approach

Generalized switch-setting problems

Computing photonic band structures by Dirichlet-to-Neumann maps: The triangular lattice

A cost-effective implementation of multilevel tiling

Introducing parallel manipulators through laboratory experiments

Learning missing values from summary constraints