Batched Sparse Linear Algebra has become an emergent processing mode on modern hardware accelerators based on Graphics Processing Units (GPUs) developed over the years to serve as the main compute devices in the largest computing clusters and supercomputers. We propose a set solver interface designs for batched sparse numerical solvers on these hardware accelerators. We motivate our specific designs by both their use in scientific applications of national importance and also by the possibility of implementing them in an efficient and portable manner with multiple options for vendor-specific optimizations. We present the C language interface calls for the linker-agnostic interchange of functional entry points. We also show how using C++ for the batched solvers simplifies the interface design while giving the user much broader set of opportunities for customization, testing, and debugging. We also cover in our proposals the option of exploiting multiple floating-point arithmetic precisions to directly match the application needs in terms of accuracy. Finally, a selected sample of performance experiments show how our proposed interface can be efficiently implemented to outperform the available alternatives many times over. In the end, we plan for an ongoing evolution of our newly proposed interface standard to keep up with the updates in programming languages, accelerator hardware, and application needs.
Read full abstract