Abstract
We apply object-oriented software design patterns to develop code for scientific software involving sparse matrices. Design patterns arise when multiple independent developments produce similar designs which converge onto a generic solution. We demonstrate how to use design patterns to implement an interface for sparse matrix computations on NVIDIA GPUs starting from PSBLAS, an existing sparse matrix library, and from existing sets of GPU kernels for sparse matrices. We also compare the throughput of the PSBLAS sparse matrix–vector multiplication on two platforms exploiting the GPU with that obtained by a CPU-only PSBLAS implementation. Our experiments exhibit encouraging results regarding the comparison between CPU and GPU executions in double precision, obtaining a speedup of up to 35.35 on NVIDIA GTX 285 with respect to AMD Athlon 7750, and up to 10.15 on NVIDIA Tesla C2050 with respect to Intel Xeon X5650.
Highlights
Computational scientists concern themselves with producing science, even when a significant percentage of their time goes to engineering software
This paper demonstrates how well-known software engineering design patterns can be used to implement an interface for sparse matrix computations on Graphics Processing Units (GPUs) starting from an existing, non-GPU-enabled library
Our reported experience demonstrates that the application of design patterns facilitated a significant reduction in the development effort in the presented context; we present some experimental performance results on different NVIDIA platforms demonstrating the throughput improvement achieved by implementing the Parallel Sparse Basic Linear Algebra Subroutines (PSBLAS) interface for sparse matrix computations on GPUs
Summary
Computational scientists concern themselves with producing science, even when a significant percentage of their time goes to engineering software. We discuss how to employ the considered techniques to interface the existing PSBLAS library with a plug-in in the Compute Unified Device Architecture (CUDA) C language that implements the computational kernels on the NVIDIA GPUs. Our reported experience demonstrates that the application of design patterns facilitated a significant reduction in the development effort in the presented context; we present some experimental performance results on different NVIDIA platforms demonstrating the throughput improvement achieved by implementing the PSBLAS interface for sparse matrix computations on GPUs. The software described in this paper is available at http://www.ce.uniroma2.it/psblas. The rest of the paper is organized as follows: Section 2 describes several design patterns; Section 3 provides some background on GPUs and presents the interfaces for sparse-matrix computations on GPUs starting with the PSBLAS library and focusing on matrix– vector multiplication with code examples; Section 4 demonstrates the utility and performance benefits accrued by use of the presented patterns; and Section 5 concludes the paper and gives hints for future work
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.