Abstract

SummaryAutotuning, the practice of automatic tuning of applications to provide performance portability, has received increased attention in the research community, especially in high performance computing. Ensuring high performance on a variety of hardware usually means modifications to the code, often via different values of a selected set of parameters, such as tiling size, loop unrolling factor, or data layout. However, the search space of all possible combinations of these parameters can be large, which can result in cases where the benefits of autotuning are outweighed by its cost, especially with dynamic tuning. Therefore, estimating the tuning time in advance or shortening the tuning time is very important in dynamic tuning applications. We have found that certain properties of tuning spaces do not vary much when hardware is changed. In this article, we demonstrate that it is possible to use historical data to reliably predict the number of tuning steps that is necessary to find a well‐performing configuration and to reduce the size of the tuning space. We evaluate our hypotheses on a number of HPC benchmarks written in CUDA and OpenCL, using several different generations of GPUs and CPUs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call