Abstract

The LU decomposition is a popular linear algebra technique with applications such as the solution of systems of linear equations and calculation of matrix inverses and determinants. Central processing unit (CPU) versions of this routine exhibit very high performance, making the port to a graphics processing unit (GPU) a challenging prospect. This chapter discusses the implementation of LU decomposition in CULA library for linear algebra on the GPU, describing the steps necessary for achieving significant speed-ups over the CPU. Specialized techniques are employed by CULA to obtain significant speed-ups over existing packages. CULA features a wide variety of linear algebra functions, including least squares solvers (constrained and unconstrained), system solvers (general and symmetric positive definite), eigenproblem solvers (general and symmetric), singular value decompositions, and many useful factorizations (QR, Hessenberg). It also presents a number of methods for interfacing with CULA. The two major interfaces are host and device, and they accept data via host memory and device memory, respectively. The host interface features high convenience, whereas the device interface is more manual, but can avoid data transfer times. Additionally, there are facilities for interfacing with MATLAB and the Fortran language.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.