Explaining the Performance of Supervised and Semi-Supervised Methods for Automated Sparse Matrix Format Selection

Akshay Deodhar,Johannes Langguth,Konstantin Pogorelov,Sunidhi Dhandhania,Swarnendu Biswas

doi:10.1145/3458744.3474049

Abstract

The performance of sparse matrix-vector multiplication kernels (SpMV) depends on the sparse matrix storage format and the architecture and the memory hierarchy of the target processor. Many sparse matrix storage formats along with corresponding SpMV algorithms have been proposed for improved SpMV performance. Given a sparse matrix and a target architecture, supervised Machine Learning techniques automate selecting the best formats. However, existing supervised approaches suffer from several drawbacks. They depend on large representative datasets and are expensive to train. In addition, retraining to incorporate new classes of matrices or different processor architectures is just as costly since new training data must be generated by benchmarking many instances. Furthermore, it is hard to understand the results of many supervised systems. We propose using semi-supervised machine learning techniques for format selection. We highlight the challenges in using the K-Means clustering for the sparse format selection problem and show how to adapt the algorithm to improve its performance. An empirical evaluation of our technique shows that the performance of our proposed semi-supervised learning approach is competitive with supervised methods, in addition to providing flexibility and explainability.

Full Text