Abstract

Kernel scheduling is the problem of finding the most efficient implementation for a computational kernel. Identifying this implementation involves experimenting with the parameters of compiler optimizations, such as the size of tiling windows and unrolling factors. This paper shows that it is possible to organize these parameters as points in a coordinate space. The function that maps these points to the running time of kernels, in general, will not determine a convex surface. However, this paper provides empirical evidence that the origin of this surface—an unoptimized kernel—and its global optimum—the fastest kernel—reside on a convex region. We call this hypothesis the “droplet expectation”. Consequently, a search method based on the coordinate descent algorithm tends to find the optimal kernel configuration quickly if the hypothesis holds. This approach—called Droplet Search—has been available in Apache TVM since April of 2023. Experimental results with six large deep learning models on various computing devices (ARM, Intel, AMD, and NVIDIA) indicate that Droplet Search is not only as effective as other AutoTVM search techniques but also two to ten times faster. Moreover, models generated by Droplet Search are competitive with those produced by TVM’s AutoScheduler (Ansor), despite the latter using four to five times more code transformations than AutoTVM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call