CHAD for expressive total languages
Abstract We show how to apply forward and reverse mode Combinatory Homomorphic Automatic Differentiation (CHAD) (Vákár (2021). ESOP, 607–634; Vákár and Smeding (2022). ACM Transactions on Programming Languages and Systems44 (3) 20:1–20:49.) to total functional programming languages with expressive type systems featuring the combination of •tuple types;•sum types;•inductive types;•coinductive types;•function types.We achieve this by analyzing the categorical semantics of such types in $\Sigma$ -types (Grothendieck constructions) of suitable categories. Using a novel categorical logical relations technique for such expressive type systems, we give a correctness proof of CHAD in this setting by showing that it computes the usual mathematical derivative of the function that the original program implements. The result is a principled, purely functional and provably correct method for performing forward- and reverse-mode automatic differentiation (AD) on total functional programming languages with expressive type systems.
- Book Chapter
7
- 10.1137/1.9781611976489.11
- Jan 1, 2021
Automatic differentiation (AD) is a technique for computing the derivative of function F: Rn → Rm defined by a computer program. Modern applications of AD, such as machine learning, typically use AD to facilitate gradient-based optimization of an objective function for which m≪n (often m=1). As a result, these applications typically use reverse (or adjoint) mode AD to compute the gradient of F efficiently, in time Θ(m·T1(F)), where T1 is the work (serial running time) of F. Although the serial running time of reverse-mode AD has a well known relationship to the total work of F, general-purpose reverse-mode AD has proven challenging to parallelize in a work-efficient and scalable fashion, as simple approaches tend to result in poor performance or scalability. This paper introduces PARAD, a work-efficient parallel algorithm for reverse-mode AD of determinacy-race-free recursive fork-join programs. We analyze the performance of PARAD using work/span analysis. Given a program F with work T1(F) and span (critical-path length) T∞(F), PARAD performs reverse-mode AD of F in O(m·T1(F)) work and O(logm + log(T1(F))T∞(F)) span. To the best of our knowledge, PARAD is the first parallel algorithm for performing reverse-mode AD that is both provably work-efficient and has span within a polylogarithmic factor of the original program F. We implemented PARAD as an extension of Adept, a C++ library for performing reverse-mode AD for serial programs that is known for its efficiency. Our implementation supports the use of Cilk fork-join parallelism and requires no programmer annotations of parallel control flow. Instead, it uses compiler instrumentation to dynamically trace a program's series-parallel structure, which is used to automatically parallelize the gradient computation via reverse-mode AD. On eight machine-learning benchmarks, our implementation of PARAD achieves 1.5× geometric-mean multiplicative work overhead relative to the serial Adept tool, and 8.9× geometric-mean self-relative speedup on 18 cores.
- Research Article
14
- 10.1145/3527634
- Aug 17, 2022
- ACM Transactions on Programming Languages and Systems
We introduce Combinatory Homomorphic Automatic Differentiation (CHAD), a principled, pure, provably correct define-then-run method for performing forward and reverse mode automatic differentiation (AD) on programming languages with expressive features. It implements AD as a compositional, type-respecting source-code transformation that generates purely functional code. This code transformation is principled in the sense that it is the unique homomorphic (structure preserving) extension to expressive languages of Elliott’s well-known and unambiguous definitions of AD for a first-order functional language. Correctness of the method follows by a (compositional) logical relations argument that shows that the semantics of the syntactic derivative is the usual calculus derivative of the semantics of the original program.In their most elegant formulation, the transformations generate code with linear types. However, the code transformations can be implemented in a standard functional language lacking linear types: While the correctness proof requires tracking of linearity, the actual transformations do not. In fact, even in a standard functional language, we can get all of the type-safety that linear types give us: We can implement all linear types used to type the transformations as abstract types by using a basic module system.In this article, we detail the method when applied to a simple higher-order language for manipulating statically sized arrays. However, we explain how the methodology applies, more generally, to functional languages with other expressive features. Finally, we discuss how the scope of CHAD extends beyond applications in AD to other dynamic program analyses that accumulate data in a commutative monoid.
- Research Article
5
- 10.2168/lmcs-9(1:14)2013
- Mar 29, 2013
- Logical Methods in Computer Science
This paper extends the dual calculus with inductive types and coinductive types. The paper first introduces a non-deterministic dual calculus with inductive and coinductive types. Besides the same duality of the original dual calculus, it has the duality of inductive and coinductive types, that is, the duality of terms and coterms for inductive and coinductive types, and the duality of their reduction rules. Its strong normalization is also proved, which is shown by translating it into a second-order dual calculus. The strong normalization of the second-order dual calculus is proved by translating it into the second-order symmetric lambda calculus. This paper then introduces a call-by-value system and a call-by-name system of the dual calculus with inductive and coinductive types, and shows the duality of call-by-value and call-by-name, their Church-Rosser properties, and their strong normalization. Their strong normalization is proved by translating them into the non-deterministic dual calculus with inductive and coinductive types.
- Research Article
27
- 10.3233/fi-2017-1473
- Mar 3, 2017
- Fundamenta Informaticae
Functional languages offer a high level of abstraction, which results in programs that are elegant and easy to understand. Central to the development of functional programming are inductive and coinductive types and associated programming constructs, such as pattern-matching. Whereas inductive types have a long tradition and are well supported in most languages, coinductive types are subject of more recent research and are less mainstream. We present CoCaml, a functional programming language extending OCaml, which allows us to define recursive functions on regular coinductive datatypes. These functions are defined like usual recursive functions, but parameterized by an equation solver. We present a full implementation of all the constructs and solvers and show how these can be used in a variety of examples, including operations on infinite lists, infinitary γ-terms, and p-adic numbers.
- Dataset
- 10.15200/winn.162133.38896
- May 18, 2021
- The Winnower
Automatic differentiation is a "compiler trick" whereby a code that calculates f(x) is transformed into a code that calculates f'(x). This trick and its two forms, forward and reverse mode automatic differentiation, have become the pervasive backbone behind all of the machine learning libraries. If you ask what PyTorch or Flux.jl is doing that's special, the answer is really that it's doing automatic differentiation over some functions.
- Research Article
119
- 10.1145/1330017.1330018
- Mar 1, 2008
- ACM Transactions on Programming Languages and Systems
We show that reverse-mode AD (Automatic Differentiation)—a generalized gradient-calculation operator—can be incorporated as a first-class function in an augmented lambda calculus, and therefore into a functional-programming language. Closure is achieved, in that the new operator can be applied to any expression in the augmented language, yielding an expression in that language. This requires the resolution of two major technical issues: (a) how to transform nested lambda expressions, including those with free-variable references, and (b) how to support self application of the AD machinery. AD transformations preserve certain complexity properties, among them that the reverse phase of the reverse-mode AD transformation of a function have the same temporal complexity as the original untransformed function. First-class unrestricted AD operators increase the expressive power available to the numeric programmer, and may have significant practical implications for the construction of numeric software that is robust, modular, concise, correct, and efficient.
- Research Article
3
- 10.1017/s0960129524000215
- Oct 21, 2024
- Mathematical Structures in Computer Science
We give a simple, direct, and reusable logical relations technique for languages with term and type recursion and partially defined differentiable functions. We demonstrate it by working out the case of automatic differentiation (AD) correctness: namely, we present a correctness proof of a dual numbers style AD code transformation for realistic functional languages in the ML-family. We also show how this code transformation provides us with correct forward- and reverse-mode AD. The starting point is to interpret a functional programming language as a suitable freely generated categorical structure. In this setting, by the universal property of the syntactic categorical structure, the dual numbers AD code transformation and the basic $\boldsymbol{\omega } \mathbf{Cpo}$ -semantics arise as structure preserving functors. The proof follows, then, by a novel logical relations argument. The key to much of our contribution is a powerful monadic logical relations technique for term recursion and recursive types. It provides us with a semantic correctness proof based on a simple approach for denotational semantics, making use only of the very basic concrete model of $\omega$ -cpos.
- Research Article
- 10.1145/1411203.1411207
- Sep 20, 2008
- ACM SIGPLAN Notices
With features that include lightweight syntax, expressive type systems, and deep semantic foundations, functional languages are now being used to develop an increasingly broad range of complex, real-world applications. In the area of systems software, however, where performance and interaction with low-level aspects of hardware are central concerns, many practitioners still eschew the advantages of higher-level languages for the potentially unsafe but predictable behavior of traditional imperative languages like C. It is ironic that critical applications such as operating systems kernels, device drivers, and VMMs - where a single bug could compromise the reliability or security of a whole system - are among the least likely to benefit from the abstractions and safety guarantees of modern language designs. Over the last few years, our group has been investigating the potential for using Haskell to develop realistic operating systems that can boot and run on bare metal. The House system, developed primarily by Thomas Hallgren and Andrew Tolmach, demonstrates that it is indeed possible to construct systems software in a functional language. But House still relies on a layer of runtime support primitives - some written using unsafe Haskell primitives and others written in C - to provide services ranging from garbage collection to control of the page table structures used by the hardware memory management unit. We would like to replace as much of this layer as possible with code written in a functional language without compromising on type or memory safety. Our experiences with House have led us to believe that a new functional language is required to reflect the needs of the systems domain more directly. Interestingly, however, we have concluded that this does not require fundamental new language design. In this invited talk, I will give an update on the current status of our project and I will describe how we are leveraging familiar components of the Haskell type system - including polymorphism, kinds, qualified types and improvement - to capture more precise details of effect usage, data representation, and termination. I will also discuss the challenges of writing and compiling performance-sensitive code written in a functional style. It was once considered radical to use C in place of assembly language to construct systems software. Is it possible that functional languages might one day become as commonplace in this application domain as C is today?
- Book Chapter
6
- 10.1007/978-3-642-02348-4_16
- Jan 1, 2009
This paper gives an extension of Dual Calculus by introducing inductive types and coinductive types. The same duality as Dual Calculus is shown to hold in the new system, that is, this paper presents its involution for the new system and proves that it preserves both typing and reduction. The duality between inductive types and coinductive types is shown by the existence of the involution that maps an inductive type and a coinductive type to each other. The strong normalization in this system is also proved. First, strong normalization in second-order Dual Calculus is shown by translating it into second-order symmetric lambda calculus. Next, strong normalization in Dual Calculus with inductive and coinductive types is proved by translating it into second-order Dual Calculus.
- Research Article
9
- 10.1080/10556789808805717
- Jan 1, 1998
- Optimization Methods and Software
Given a program computing the value of a function with many variables, the reverse mode automatic differentiation (or top-down algorithm of automatic differentiation) swiftly computes the values of the partial derivatives of the function. But it is a weak point that it requires storage whose size is proportional to the complexity of the underlying function. We report on a preprocessor that can handle any Fortran77 programs with an improved reverse mode automatic differentiation for reducing the size of the storage by means of a recursive checkpointing mechanism. Developing a library program named RCL/fork (Recursive Checkpointing Library program with fork system-call) based on the fork system-call provided by the UNIX operating system, we could reduce the size of the virtual memory below the half for computation of the partial derivatives that requires about 1.3 GB virtual memory with the original reverse mode automatic differentiation.
- Conference Article
28
- 10.1145/503032.503043
- Jan 14, 2002
We investigate CPS translatability of typed λ-calculi with inductive and coinductive types. We show that tenable Plotkin-style call-by-name CPS translations exist for simply typed λ-calculi with a natural number type and stream types and, more generally, with arbitrary positive inductive and coinductive types. These translations also work in the presence of control operators and generalize for dependently typed calculi where case-like eliminations are only allowed in non-dependent forms. No translation is possible along the same lines for small Σ-types and sum types with dependent case.
- Research Article
6
- 10.1145/509799.503043
- Jan 14, 2002
- ACM SIGPLAN Notices
We investigate CPS translatability of typed λ-calculi with inductive and coinductive types. We show that tenable Plotkin-style call-by-name CPS translations exist for simply typed λ-calculi with a natural number type and stream types and, more generally, with arbitrary positive inductive and coinductive types. These translations also work in the presence of control operators and generalize for dependently typed calculi where case-like eliminations are only allowed in non-dependent forms. No translation is possible along the same lines for small Σ-types and sum types with dependent case.
- Research Article
14
- 10.1051/ita:1999120
- Jul 1, 1999
- RAIRO - Theoretical Informatics and Applications
We study five extensions of the polymorphically typed lambda-calculus (system F) by type constructs intended to model fixed-points of monotone operators. Building on work by Geuvers concerning the relation between term rewrite systems for least pre-fixed-points and greatest post-fixed-points of positive type schemes (i.e., non-nested positive inductive and coinductive types) and so-called retract types, we show that there are reduction-preserving embeddings even between systems of monotone (co)inductive types and non-inter leav ing positive fixed-point types (which are essentially those retract types). The reduction relation considered is β- and η-reduction for system FF plus either (full) primitive recursion on the inductive types or (full) primitive corecursion on the coinductive types or an extremely simple rule for the fixed-point types. Monotonicity is not confined to the syntactic restriction on type formation of having only positive occurrences of the type variable α in ρ for the inductive type µαρ or the coinductive type ναρ. Instead of that only a “monotonicity witness” which is a term of type ∀α∀β.(α → β) → ρ → ρ[α:=β] is required. This term may already use (co)recursion such that our monotone (co)inductive types may even be “interleaved” and not only nested.
- Research Article
9
- 10.21314/jcf.2016.209
- Mar 1, 2016
- The Journal of Computational Finance
Automatic differentiation (AD) is a practical field of computational mathematics that is of growing interest across many industries, including finance. The use of reverse-mode AD is particularly interesting, since it allows for the computation of gradients in the same time required to evaluate the objective function itself. However, it requires excessive memory. This memory requirement can make reverse-mode AD infeasible in some cases (depending on the function complexity and available RAM) and slower than expected in others, due to the use of secondary memory and nonlocalized memory references. However, it turns out that many complex (expensive) functions in finance exhibit a natural substitution structure. In this paper, we illustrate this structure in computational finance as it arises in calibration and inverse problems, and determine Greeks in a Monte Carlo setting. In these cases, the required memory is a small fraction of that required by reverse-mode AD, but the computing time complexity is the same. In fact, our results indicate a significant realized speedup compared with straight reverse-mode AD.
- Research Article
22
- 10.1021/acs.jpca.2c05922
- Nov 8, 2022
- The Journal of Physical Chemistry A
Automatic differentiation (AD) has become an important tool for optimization problems in computational science, and it has been applied to the Hartree-Fock method. Although the reverse-mode AD is more efficient than the forward-mode, eigenvalue calculation in the self-consistent field (SCF) method has impeded the use of the reverse-mode AD. Here, we propose a method to directly minimize Hartree-Fock energy under the orthonormality constraint of the molecular orbitals using reverse-mode AD by avoiding eigenvalue calculation. According to our validation, the proposed method was more stable than the conventional SCF method and achieved comparable accuracy.