Abstract
We explore the use of synthetic benchmarks for the training phase of machine-learning-based automatic performance tuning. We focus on the problem of predicting if the use of local memory on a GPU is beneficial for caching a single target array in a GPU kernel. We show that the use of only 13 real benchmarks leads to poor prediction accuracy (about to 58%) of the 13 leave-one-out models trained using these benchmarks, even when the model features are sufficiently comprehensive. We define a metric, called the average vicinity density to measure the quality of a training set. We then use it to demonstrate that the poor accuracy of the models built with the real benchmarks is indeed because of the limited size and coverage of the training set. In contrast, the use of 90K properly generated set of synthetic benchmarks leads to significantly better accuracies, up to 87%. These results validate our approach of using synthetic benchmarks for training machine learning models. We describe a synthetic benchmark template for the local memory optimization. We then present two approaches to using this template and a seed set of real benchmarks to generate a large number of synthetic benchmark. We also explore the impact of the number of synthetic benchmarks used in training.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.