Abstract

We present techniques for supernodal sparse Cholesky factorization on a hybrid multicore platform consisting of a multicore CPU and GPU. The techniques are the subtree algorithm, pipelining and multithreading. The subtree algorithm [15] minimizes PCIe transmissions by storing an entire branch of the elimination tree in the GPU memory (the elimination tree is a tree data structure describing the workflow of the factorization), and also reduces the total kernel launch time by launching BLAS kernels in batches. The pipelining technique overlaps the execution of GPU kernels and PCIe data transfers. The multithreading technique [17] creates multiple threads for both the CPU and the GPU, to utilize concurrency of the elimination tree. Our experimental results on a platform consisting of an Intel multicore processor along with an Nvidia GPU indicate a significant improvement in performance and energy over CHOLMOD (SuiteSparse 4.5.3), a sparse algorithm, after these techniques are applied.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.