Abstract

Batched Sparse Linear Algebra has become an emergent processing mode on modern hardware accelerators based on Graphics Processing Units (GPUs) developed over the years to serve as the main compute devices in the largest computing clusters and supercomputers. We propose a set solver interface designs for batched sparse numerical solvers on these hardware accelerators. We motivate our specific designs by both their use in scientific applications of national importance and also by the possibility of implementing them in an efficient and portable manner with multiple options for vendor-specific optimizations. We present the C language interface calls for the linker-agnostic interchange of functional entry points. We also show how using C++ for the batched solvers simplifies the interface design while giving the user much broader set of opportunities for customization, testing, and debugging. We also cover in our proposals the option of exploiting multiple floating-point arithmetic precisions to directly match the application needs in terms of accuracy. Finally, a selected sample of performance experiments show how our proposed interface can be efficiently implemented to outperform the available alternatives many times over. In the end, we plan for an ongoing evolution of our newly proposed interface standard to keep up with the updates in programming languages, accelerator hardware, and application needs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.