Compilation Techniques for Incremental Collection Processing

Daniel Lupei

doi:10.5075/epfl-thesis-8019

Abstract

Many map-reduce frameworks as well as NoSQL systems rely on collection programming as their interface of choice due to its rich semantics along with an easily parallelizable set of primitives. Unfortunately, the potential of collection programming is not entirely fulfilled by current systems as they lack efficient incremental view maintenance (IVM) techniques for queries producing large nested results. This comes as a consequence of the fact that the nesting of collections does not enjoy the same algebraic properties underscoring the optimization potential of typical collection processing constructs. We propose the first solution for the efficient incrementalization of collection programming in terms of its core constructs as captured by the positive nested relational calculus (NRC+) on bags (with integer multiplicities). We take an approach based on delta query derivation, whose goal is to generate delta queries which, given a small change in the input, can update the materialized view more efficiently than via recomputation. More precisely, we model the cost of NRC+ operators and classify queries as efficiently incrementalizable if their delta has a strictly lower cost than full re-evaluation. Then, we identify IncNRC+, a large fragment of NRC+ that is efficiently incrementalizable and we provide a semantics-preserving translation that takes any NRC+ query to a collection of IncNRC+ queries. Furthermore, we prove that incrementalmaintenance for NRC+ is within the complexity class NC0 and we showcase how Recursive IVM, a technique that has provided significant speedups over traditional IVM in the case of flat queries, can also be applied to IncNRC+ . Existing systems are also limited wrt. the size of inner collections that they can effectively handle before running into severe performance bottlenecks. In particular, in the face of nested collections with skewed cardinalities developers typically have to undergo a painful process of manual query re-writes in order to ensure that the largest inner collections in their workloads are not impacted by these limitations. To address these issues we developed SLeNDer, a compilation framework that given a nested query generates a set of semantically equivalent (partially) shredded queries that can be efficiently evaluated and incrementalized using state of the art techniques for handling skew and applying delta changes, respectively. The derived queries expose nested collections to the same opportunities for distributing their processing and incrementally updating their contents as those enjoyed by top-level collections, leading on our benchmark to up to 16.8x and 21.9x speedups in terms of offline and online processing, respectively. In order to enable efficient IVM for the increasingly common case of collection programming with functional values as in Links, we also discuss the efficient incrementalization of simplytyped lambda calculi, under the constraint that their primitives are themselves efficiently incrementalizable.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Compilation Techniques for Incremental Collection Processing

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Efficient Online Processing for Advanced Analytics
...
-
, et. al. ...
01 Jan 2017
01 Jan 2017

Nest-Site Selection and Nesting Success of Cavity-Nesting Birds in High Elevation Forest Drainages
...
The Auk | VOL. 108
, et. al. ...
20 Nov 2009
The Auk | VOL. 108

Multi-dimensional dependency grammar as multigraph description
...
-
, et. al. ...
01 Jan 2006
01 Jan 2006

Incremental View Maintenance For Collection Programming
Christoph Koch ... Val Tannen
-
Christoph Koch, et. al.Christoph Koch ... Val Tannen
15 Jun 2016
15 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Compilation Techniques for Incremental Collection Processing

Abstract

Talk to us

Similar Papers