Abstract

Garbage collectors relieve the programmer from manual memory management, but lead to compiler-generated machine code that can behave differently (e.g. out-of-memory errors) from the source code. To ensure that the generated code behaves exactly like the source code, programmers need a way to answer questions of the form: what is a sufficient amount of memory for my program to never reach an out-of-memory error? This paper develops a cost semantics that can answer such questions for CakeML programs. The work described in this paper is the first to be able to answer such questions with proofs in the context of a language that depends on garbage collection. We demonstrate that positive answers can be used to transfer liveness results proved for the source code to liveness guarantees about the generated machine code. Without guarantees about space usage, only safety results can be transferred from source to machine code. Our cost semantics is phrased in terms of an abstract intermediate language of the CakeML compiler, but results proved at that level map directly to the space cost of the compiler-generated machine code. All of the work described in this paper has been developed in the HOL4 theorem prover.

Highlights

  • High-level programming languages with runtimes that include a garbage collector (GC) provide a layer of abstraction that makes memory seem unbounded. This liberates the programmer from tedious and error-prone manual memory management but it leads to compiler-generated machine code that exhibits a form of partiality: the machine code will behave as the source semantics dictates, unless memory is exhausted

  • We prove that the cost semantics is sound for an end-to-end verified compiler that relies on garbage collection for correct operation

  • We show that the cost semantics is concrete enough to prove specific space bounds for a few sample programs and, once bounds have been proved, liveness properties proved at the level of source code transfer directly to liveness properties about the compiler-generated machine code

Read more

Summary

INTRODUCTION

High-level programming languages with runtimes that include a garbage collector (GC) provide a layer of abstraction that makes memory seem unbounded. Well-written source-level programs stay clear of this partiality by making sure that the live data used by the program stays within some reasonable bound For such programs, the GC can always reclaim enough memory to provide space for new allocations, even if there are an unbounded number of allocations during program execution. Suppose one proves a liveness property that a source program will forever print "y" using a program logic [Åman Pohjola et al 2019] It does not follow from the compiler correctness theorem that the generated machine code will forever do the same: the partiality means that only safety properties carry over. We show that the cost semantics is concrete enough to prove specific space bounds for a few sample programs and, once bounds have been proved, liveness properties proved at the level of source code transfer directly to liveness properties about the compiler-generated machine code. All of the work presented in this paper has been developed in the HOL4 theorem prover [Slind and Norrish 2008] and is available at https://code.cakeml.org

Limitations
Why Can Generated Code Exit Early?
At What Level of Abstraction Should the Cost Semantics Be Expressed?
A Note on Semantics
Structure of the Proofs
DataLang As an Intermediate Language
A DataLang Program from a User’s Point of View
DataLang As a Cost Semantics
PROVING SOUNDNESS OF HEAP COST
Proving evaluate-Level Simulation
Notation and Invariants
Lessons Learned
PROVING SOUNDNESS OF STACK COST
TOP-LEVEL COMPILER THEOREM WITH COST
PROVING THAT PROGRAMS ARE SAFE FOR SPACE
A Higher-Order Example
Other Examples
RELATED WORK
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call