Abstract

Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express both a relational query and the layout of its data. Our language can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive program synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build an optimizing compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of our specialized queries is better than a state-of-the-art in memory compiled database system while achieving an order-of-magnitude reduction in memory use.

Highlights

  • Traditional database systems are generic and powerful, but they are not well optimized for static databases

  • This paper introduces Castor: a domain specific language and compiler for building static databases

  • We show that Castor is competitive with the state of the art in-memory compiled database system Hyper [Neumann 2011] while using significantly less memory

Read more

Summary

INTRODUCTION

Traditional database systems are generic and powerful, but they are not well optimized for static databases. A static database is one where the data changes slowly or not at all and the queries are fixed. These two constraints introduce opportunities for aggressive optimization and specialization. Castor achieves high performance by combining query compilation techniques from state-of-the-art in-memory databases [Neumann 2011] with a new deductive synthesis approach for generating specialized data structures. Not all of the data in the original database is needed, and some attributes are only used in aggregates. As another example, consider a company which is shipping.

Slower Results
Contributions
Limitations
MOTIVATING EXAMPLE
Background
The Layout Algebra
Optimization Trade-Offs
Nested Layout
LANGUAGE
Syntax
Relational Semantics
Staging
TRANSFORMATIONS
Partitioning
Join Elimination
Predicate Precomputation
Correctness
OPTIMIZATION
Scheduling
Cost Model
COMPILATION
Layout Semantics
Data Structure Specialization
Runtime Semantics
Performance of Staging
EVALUATION
TPC-H Analytics Benchmark
Performance of Multiple Query Workloads
Summary
RELATED WORK
CONCLUSION
A CORRECTNESS OF FILTER ELIMINATION
B SEMANTICS OF THE LAYOUT ALGEBRA
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call