Abstract
Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds.
Highlights
Modern data management systems – such as Spark [27, 33], Hadoop [16, 11], and others [17] – have extensively used parallelism to speed up query processing over massive volumes of data
We show that an economic policy can capture several algorithms used for parallel evaluation of recursive and non-recursive queries, including the Hypercube algorithm [13, 4], and the decomposable strategies based on program restrictions [30]
To overcome the undecidability of parallel-correctness, we identify a general family of economic policies, called Generalized Hypercube Policies (GHPs), which are always parallel-correct, and further capture several commonly used parallel evaluation strategies
Summary
Modern data management systems – such as Spark [27, 33], Hadoop [16, 11], and others [17] – have extensively used parallelism to speed up query processing over massive volumes of data. To reason about Hypercube-like algorithms, Ameloot et al [6] recently introduced a framework that captures one-round evaluation of joins under different data distributions Their framework implicitly describes a single-round parallel algorithm through a distribution policy, which specifies how the facts in the input relations are distributed among the machines. We show that an economic policy can capture several algorithms used for parallel evaluation of recursive and non-recursive queries, including the Hypercube algorithm [13, 4], and the decomposable strategies based on program restrictions [30]. In this framework we study several properties of economic policies. We ask which Datalog programs admit economic policies that are bounded by one round: we show that such programs are characterized by a syntactic property called pivoting, which was identified by Wolfson and Silberschatz [32] in the context of decomposable programs
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.