Abstract

A traditional goal of algorithmic optimality, squeezing out flops, has been superseded by evolution in architecture. Flops no longer serve as a reasonable proxy for all aspects of complexity. Instead, algorithms must now squeeze memory, data transfers, and synchronizations, while extra flops on locally cached data represent only small costs in time and energy. Hierarchically low-rank matrices realize a rarely achieved combination of optimal storage complexity and high-computational intensity for a wide class of formally dense linear operators that arise in applications for which exascale computers are being constructed. They may be regarded as algebraic generalizations of the fast multipole method. Methods based on these hierarchical data structures and their simpler cousins, tile low-rank matrices, are well proportioned for early exascale computer architectures, which are provisioned for high processing power relative to memory capacity and memory bandwidth. They are ushering in a renaissance of computational linear algebra. A challenge is that emerging hardware architecture possesses hierarchies of its own that do not generally align with those of the algorithm. We describe modules of a software toolkit, hierarchical computations on manycore architectures, that illustrate these features and are intended as building blocks of applications, such as matrix-free higher-order methods in optimization and large-scale spatial statistics. Some modules of this open-source project have been adopted in the software libraries of major vendors.This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Highlights

  • A traditional goal of algorithmic optimality, squeezing out flops, has been superseded by evolution in architecture

  • They may be regarded as algebraic generalizations of the fast multipole method. Methods based on these hierarchical data structures and their simpler cousins, tile low-rank matrices, are well proportioned for early exascale computer architectures, which are provisioned for high processing power relative to memory capacity and memory bandwidth. They are ushering in a renaissance of computational linear algebra

  • We describe modules of a software toolkit, hierarchical computations on manycore architectures, that illustrate these features and are intended as building blocks of applications, such as matrix-free higher-order methods in optimization and large-scale spatial statistics

Read more

Summary

A renaissance in computational linear algebra

A renaissance has come to computational linear algebra in the form of hierarchically low-rank matrices ( ‘H-matrices’). The desire to extend computable problem sizes in spatial statistics has in recent years led researchers to consider far more drastic approximations of covariance matrices; see e.g. discussions in [4] Such severe approximation is not necessary to scale to larger problem sizes, since hierarchically low-rank approximations allow navigation of the accuracy-capacity trade-off in a graceful way. Tile low-rank ( ‘TLR’) matrices are straightforward generalizations of the tiled matrix data structures that have proved fruitful in migrating dense linear algebra kernels of ScaLAPACK to the many-core shared-memory environment [7,8]. TLR instantly migrates the benefits of data sparsity within a tile to the rich libraries of tile-based kernels

Tile low-rank representation
Hierarchical low-rank representation
Programming practices for exascale architectures
Findings
A hierarchical hourglass future
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call