Abstract

We apply object-oriented software design patterns to develop code for scientific software involving sparse matrices. Design patterns arise when multiple independent developments produce similar designs which converge onto a generic solution. We demonstrate how to use design patterns to implement an interface for sparse matrix computations on NVIDIA GPUs starting from PSBLAS, an existing sparse matrix library, and from existing sets of GPU kernels for sparse matrices. We also compare the throughput of the PSBLAS sparse matrix–vector multiplication on two platforms exploiting the GPU with that obtained by a CPU-only PSBLAS implementation. Our experiments exhibit encouraging results regarding the comparison between CPU and GPU executions in double precision, obtaining a speedup of up to 35.35 on NVIDIA GTX 285 with respect to AMD Athlon 7750, and up to 10.15 on NVIDIA Tesla C2050 with respect to Intel Xeon X5650.

Highlights

  • Computational scientists concern themselves with producing science, even when a significant percentage of their time goes to engineering software

  • This paper demonstrates how well-known software engineering design patterns can be used to implement an interface for sparse matrix computations on Graphics Processing Units (GPUs) starting from an existing, non-GPU-enabled library

  • Our reported experience demonstrates that the application of design patterns facilitated a significant reduction in the development effort in the presented context; we present some experimental performance results on different NVIDIA platforms demonstrating the throughput improvement achieved by implementing the Parallel Sparse Basic Linear Algebra Subroutines (PSBLAS) interface for sparse matrix computations on GPUs

Read more

Summary

Introduction

Computational scientists concern themselves with producing science, even when a significant percentage of their time goes to engineering software. We discuss how to employ the considered techniques to interface the existing PSBLAS library with a plug-in in the Compute Unified Device Architecture (CUDA) C language that implements the computational kernels on the NVIDIA GPUs. Our reported experience demonstrates that the application of design patterns facilitated a significant reduction in the development effort in the presented context; we present some experimental performance results on different NVIDIA platforms demonstrating the throughput improvement achieved by implementing the PSBLAS interface for sparse matrix computations on GPUs. The software described in this paper is available at http://www.ce.uniroma2.it/psblas. The rest of the paper is organized as follows: Section 2 describes several design patterns; Section 3 provides some background on GPUs and presents the interfaces for sparse-matrix computations on GPUs starting with the PSBLAS library and focusing on matrix– vector multiplication with code examples; Section 4 demonstrates the utility and performance benefits accrued by use of the presented patterns; and Section 5 concludes the paper and gives hints for future work

Design patterns
Interfacing sparse-matrix computational kernels on GPUs with PSBLAS
Overview of NVIDIA GPU architecture and programming model
Sparse matrix computations on a GPU
Vectors on the GPU
Interfacing to the NVIDIA cuSPARSE library
Performance results
Human performance results
Machine performance results
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call