Abstract

In this work an algorithm for solving triangular systems of equations for multiple right hand sides is presented. The algorithm for solving triangular systems for multiple right hand sides, commonly referred to as the TRSM problem, is a very important in dense linear algebra as it is a subroutine for most decompositions of matrices as LU or QR. To improve performance over the standard iterative algorithms for TRSM, a block wise inversion paired with triangular matrix multiplications is used. To perform the inversion, the lower triangular form of the matrix is exploited and a recursive scheme is applied to further decrease communication cost. With that, the latency of the algorithm decreases while the bandwidth and floating point operations count stay asymptotically the same. Concretely, a decrease of latency with a factor of p^{2/3} / log p was achieved for a significant range of relative matrix sizes when working with p processors. The proposed method is implemented and its performance is benchmarked against the widely used ScaLAPACK library. The results show promising tendencies for the inversion, with a maximal speedup of 1.7 over ScaLAPACK for 4096 processors. Due to the inferior performance of triangular matrix multiplications with respect to the triangular solve, no overall improvement is made yet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call