Abstract

Although compiling queries to efficient machine code has become a common approach for query execution, a number of newly created database system projects still refrain from using compilation. It is sometimes claimed that the intricacies of code generation make compilation-based engines too complex. Also, a major barrier for adoption, especially for interactive ad hoc queries, is long compilation time. In this paper, we examine all stages of compiling query execution engines and show how to reduce compilation overhead. We incorporate the lessons learned from a decade of generating code in HyPer into a design that manages complexity and yields high speed. First, we introduce a code generation framework that establishes abstractions to manage complexity, yet generates code in a single fast pass. Second, we present a program representation whose data structures are tuned to support fast code generation and compilation. Third, we introduce a new compiler backend that is optimized for minimal compile time, and simultaneously, yields superior execution performance to competing approaches, e.g., Volcano-style or bytecode interpretation. We implemented these optimizations in our database system Umbra to show that it is possible to unite fast compilation and fast execution. Indeed, Umbra achieves unprecedentedly low query latencies. On small data sets, it is even faster than interpreter engines like DuckDB and PostgreSQL. At the same time, on large data sets, its throughput is on par with the state-of-the-art compiling system HyPer.

Highlights

  • Query compilation is a widely adopted approach for relational database systems [1,7,10,34,46]

  • – Umbra IR speeds up code generation (Sect. 5.3). – The Flying Start backend dominates multiple state-ofthe-art alternatives (Sect. 5.4). – The optimizations in the Flying Start backend all provide performance benefits (Sect. 5.5)

  • We conclude that Umbra IR speeds up code generation and serves its purpose well as it effectively reduces Umbra’s query latency

Read more

Summary

Introduction

Query compilation is a widely adopted approach for relational database systems [1,7,10,34,46]. Tidy Tuples uses Umbra IR as target for the code generator and source for all compilation backends This reduces the time to generate programs and to transform them to executables. Adaptive execution was introduced first to the HyPer query engine For query execution, it has a choice between using intensively optimized code for high-speed execution and two low-latency compilation backends. With the Flying Start backend we show a solution for the low-latency spectrum, i.e., short-running queries It generates code even faster than HyPer’s bytecode interpreter and the resulting execution speed is on par with HyPer’s LLVMgenerated code. Together, these three components achieve query latencies for short-running queries that previously were only possible using interpretation. Experimental results show that the triad is so effective at reducing latency that Umbra reaches the latency realms of interpretation-based engines like DuckDB and PostgreSQL, all while keeping the execu-

Tidy Tuples: a low-latency code generation framework
Background: compilation pipeline
Layer overview
From operators to instructions
SQL values
Primitive types for code generation
Host language integration
Control flow
Umbra IR structure
Umbra program representation
Constants and dead-code removal
DBMS-specific instructions
Comparison to LLVM IR
Flying Start backend
Minimal compile-time design
Machine register allocation
To be exact
Result info
Implementation of Flying Start
Evaluation
Experimental setup
Compilation time
Runtime performance robustness
Flying Start optimizations
Implementation effort
Summary
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.