Abstract

Sparse matrix-vector multiplication (SpMV) can be used to solve diverse-scaled linear systems and eigenvalue problems that exist in numerous, and varying scientific applications. One of the scientific applications that SpMV is involved in is known as Configuration Interaction (CI). CI is a linear method for solving the nonrelativistic Schrödinger equation for quantum chemical multi-electron systems, and it can deal with the ground state as well as multiple excited states. In this paper, we have developed a hybrid approach in order to deal with CI sparse matrices. The proposed model includes a newly-developed hybrid format for storing CI sparse matrices on the Graphics Processing Unit (GPU). In addition to the new developed format, the proposed model includes the SpMV kernel for multiplying the CI matrix (proposed format) by a vector using the C language and the Compute Unified Device Architecture (CUDA) platform. The proposed SpMV kernel is a vector kernel that uses the warp approach. We have gauged the newly developed model in terms of two primary factors, memory usage and performance. Our proposed kernel was compared to the cuSPARSE library and the CSR5 (Compressed Sparse Row 5) format and already outperformed both.

Highlights

  • A Graphics Processing Unit (GPU) is an electronic chip that is designed for extremely fast parallel computations and processing of data

  • We are proposing a new model for storing Configuration Interaction (CI) sparse matrices on the GPU

  • We have implemented the kernel of the Sparse matrix-vector multiplication (SpMV) operation for the proposed model

Read more

Summary

Introduction

A Graphics Processing Unit (GPU) is an electronic chip that is designed for extremely fast parallel computations and processing of data. The constant memory can be accessed by by all the threads within the grid, just like the global memory. The CPU launches on the GPU multiple copies of the kernel on parallel threads to process the GPU data. The following code is a simple GPU kernel (written in CUDA) called AddArrays that is used for the purpose of adding two integer arrays In this case, the CPU launches on the GPU multiple copies of the AddArrays kernel on parallel threads (one kernel per thread) in order to perform the addition operation. CUDA allows us to run millions of threads or more, programs that run on the GPU aren’t million times faster than the CPU for multiple reasons: It takes time to copy data from the CPU to the GPU and vice versa. As a matter of fact, improving the SpMV operation is extremely critical to the performance of a variety of scientific applications

The Schrödinger Equation
E: Energy
The CI Matrix Elements
The CI Matrix
The Proposed Work
Common Formats
The Sliced ELLPACK Format
The Sliced ELLPACK-R Format
The Proposed Model
The Developed SpMV Kernel
System Configuration
The Results
32. The proposed
Conclusions
Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.