Distribution Policies for Datalog

Bas Ketsman,Paraschos Koutris,Aws Albarghouthi

doi:10.1007/s00224-019-09959-3

Abstract

Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds.

Highlights

Modern data management systems – such as Spark [27, 33], Hadoop [16, 11], and others [17] – have extensively used parallelism to speed up query processing over massive volumes of data
We show that an economic policy can capture several algorithms used for parallel evaluation of recursive and non-recursive queries, including the Hypercube algorithm [13, 4], and the decomposable strategies based on program restrictions [30]
To overcome the undecidability of parallel-correctness, we identify a general family of economic policies, called Generalized Hypercube Policies (GHPs), which are always parallel-correct, and further capture several commonly used parallel evaluation strategies

Summary

Introduction

Modern data management systems – such as Spark [27, 33], Hadoop [16, 11], and others [17] – have extensively used parallelism to speed up query processing over massive volumes of data. To reason about Hypercube-like algorithms, Ameloot et al [6] recently introduced a framework that captures one-round evaluation of joins under different data distributions Their framework implicitly describes a single-round parallel algorithm through a distribution policy, which specifies how the facts in the input relations are distributed among the machines. We show that an economic policy can capture several algorithms used for parallel evaluation of recursive and non-recursive queries, including the Hypercube algorithm [13, 4], and the decomposable strategies based on program restrictions [30]. In this framework we study several properties of economic policies. We ask which Datalog programs admit economic policies that are bounded by one round: we show that such programs are characterized by a syntactic property called pivoting, which was identified by Wolfson and Silberschatz [32] in the context of decomposable programs

Parallel Complexity

Decomposability

Other Parallel Schemes

Systems

Preliminaries

Datalog

Evaluation Semantics

Proof Theoretic Concepts

The Framework

Datalog Evaluation Modulo Policies

Distributed Evaluation Strategy

Parallel-Correctness

Generalized Hypercube Policies

Weakly Pivoting GHPs

Weakly Pivoting Datalog

Bounded and Disjoint Evaluation

Conclusion

A Appendix

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Theory of Computing Systems	Publication Date: Dec 4, 2019
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Distribution Policies for Datalog

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Theory of Computing Systems

Lead the way for us

Similar Papers

Distribution Policies for Datalog.
...
-
, et. al. ...
01 Jan 2018
01 Jan 2018

Communication Issues in Scalable Parallel Computing
C.E.R Alves ... F Dehne
-
C.E.R Alves, et. al.C.E.R Alves ... F Dehne
01 Jan 2009
01 Jan 2009

Communication-Efficient Parallel Sorting
Michael T Goodrich
SIAM Journal on Computing | VOL. 29
Michael T GoodrichMichael T Goodrich
01 Jan 1998
SIAM Journal on Computing | VOL. 29

Randomized parallel list ranking for distributed memory multiprocesors
Frank Dehne ... Siang W Song
-
Frank Dehne, et. al.Frank Dehne ... Siang W Song
01 Jan 1996
01 Jan 1996

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distribution Policies for Datalog

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Theory of Computing Systems