Abstract

Adapting applications to optimally utilize available hardware is no mean feat: the plethora of choices for optimization techniques are infeasible to maximize manually. To this end, auto-tuning frameworks are used to automate this task, which in turn use optimization algorithms to efficiently search the vast searchspaces. However, there is a lack of comparability in studies presenting advances in auto-tuning frameworks and the optimization algorithms incorporated. As each publication varies in the way experiments are conducted, metrics used, and results reported, comparing the performance of optimization algorithms among publications is infeasible. The auto-tuning community identified this as a key challenge at the 2022 Lorentz Center workshop on auto-tuning. The examination of the current state of the practice in this paper further underlines this. We propose a community-driven methodology composed of four steps regarding experimental setup, tuning budget, dealing with stochasticity, and quantifying performance. This methodology builds upon similar methodologies in other fields while taking into account the constraints and specific characteristics of the auto-tuning field, resulting in novel techniques. The methodology is demonstrated in a simple case study that compares the performance of several optimization algorithms used to auto-tune CUDA kernels on a set of modern GPUs. We provide a software tool to make the application of the methodology easy for authors, and simplifies reproducibility of results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call