Abstract

We present a lightweight Coq framework for optimizing tensor kernels written in a pure, functional array language. Optimizations rely on user scheduling using series of verified, semantics-preserving rewrites. Unusually for compilation targeting imperative code with arrays and nested loops, all rewrites are source-to-source within a purely functional language. Our language comprises a set of core constructs for expressing high-level computation detail and a set of what we call reshape operators, which can be derived from core constructs but trigger low-level decisions about storage patterns and ordering. We demonstrate that not only is this system capable of deriving the optimizations of existing state-of-the-art languages like Halide and generating comparably performant code, it is also able to schedule a family of useful program transformations beyond what is reachable in Halide.

Highlights

  • In high-performance computing, a single natural algorithm over multidimensional arrays may have a bewildering variety of different code realizations, to optimize for performance on different machines

  • The transformations may be checked by compilers, so that functionality bugs can only be missed in the algorithm, not specific optimizations on it

  • Programming languages like Halide for graphics [Ragan-Kelley et al 2013] and TVM for machine learning [Chen et al 2018] have emerged to directly facilitate programming in this style, with compilers driven by optimization directives

Read more

Summary

INTRODUCTION

We present a framework embedded in the Coq proof assistant, with a language of optimization commands that is simultaneously more formally assured and more flexible than in past work. We can imagine composing an algorithm soundness proof with one of our derivations of optimized code with correctness of a lower-level-language compiler or even a hardware accelerator ś all of which are worthwhile future work. We return to define our language (including formal semantics) bottom-up, before proceeding through three crucial elements of our pipeline: basic scheduling rewrites, lowering to imperative code, and reshape operators. After an interlude explaining Coq encoding details, we present preliminary results from an empirical evaluation showing that we achieve competitive performance w.r.t. Halide on a small set of examples, managing to compile respectably fast versions of some algorithms beyond Halide’s applicability.

OVERVIEW AND MOTIVATING EXAMPLE
CORE LANGUAGE
Specification
THE SCHEDULING-REWRITE FRAMEWORK
Scheduling Rewrites
Binders and Contexts
Rewrite Tactics and Automation
COMPILATION
Normalization
Code Generation
RESHAPE OPERATORS
Compute and Storage Order
Safe Garbage
Adjoint Introduction
IMPLEMENTATION DETAIL
EXPERIMENTAL EVALUATION
Scatter-to-Gather Optimization
RELATED WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.