A Fast Dense Triangular Solve in CUDA

J D Hogg

doi:10.1137/12088358x

A Fast Dense Triangular Solve in CUDA

J D Hogg

Open Access

https://doi.org/10.1137/12088358x

Copy DOI

Journal: SIAM journal on scientific computing : a publication of the Society for Industrial and Applied Mathematics	Publication Date: Jan 1, 2013
Citations: 8

#Sparse Direct Solvers #Direct Solver + Show 7 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The level 2 BLAS operation _trsv performs a dense triangular solve and is often used in the solve phase of a direct solver following a matrix factorization. With the advent of manycore architectures reducing the cost of compute-bound parts of the computation, memory-bound operations such as this kernel become increasingly important. This is particularly noticeable in sparse direct solvers used for optimization applications where multiple memory-bound solves follow each (traditionally expensive) compute-bound factorization. In this paper, a high performance implementation of the triangular solve is developed through an analysis of theoretical and practical bounds on its run time. This implementation outperforms the CUBLAS by a factor of 5--15.

Full Text