Abstract

Sparse matrix-vector multiplication (SpMV) is a critical operation in scientific computing and engineering applications. CSR (Compressed Sparse Row) is the most popular sparse storage format and CSR-Based SpMV usually has good performance on sparse matrices with large number of non-zero elements. This paper presents our Multi-GPU SpMV implementation to improve CSR-Based SpMV performance. We make use of multiple GPUs to jointly complete SpMV computations and adopt streamed approach to increase concurrency to further improve SpMV performance. We evaluate performance of our Multi-GPU SpMV on a collection of fourteen sparse matrices and demonstrate the effectiveness of our proposed approach in performance improvement on a large-scale cluster. The average speedup achieved from our experiments is 6.68.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call