Rumble

Ingo Müller,Stefan Irimescu,Ghislain Fourny,Gustavo Alonso,Can Berker Cikis

doi:10.14778/3436905.3436910

Rumble

Ingo Müller, Stefan Irimescu + Show 3 more

Open Access

https://doi.org/10.14778/3436905.3436910

Copy DOI

Journal: Proceedings of the VLDB Endowment	Publication Date: Dec 1, 2020
Citations: 10	License type: cc-by-nc-nd

Affiliation: ETH Zurich, Riverkeeper

Abstract

This paper introduces Rumble, a query execution engine for large, heterogeneous, and nested collections of JSON objects built on top of Apache Spark. While data sets of this type are more and more wide-spread, most existing tools are built around a tabular data model, creating an impedance mismatch for both the engine and the query interface. In contrast, Rumble uses JSONiq, a standardized language specifically designed for querying JSON documents. The key challenge in the design and implementation of Rumble is mapping the recursive structure of JSON documents and JSONiq queries onto Spark's execution primitives based on tabular data frames. Our solution is to translate a JSONiq expression into a tree of iterators that dynamically switch between local and distributed execution modes depending on the nesting level. By overcoming the impedance mismatch in the engine , Rumble frees the user from solving the same problem for every single query, thus increasing their productivity considerably. As we show in extensive experiments, Rumble is able to scale to large and complex data sets in the terabyte range with a similar or better performance than other engines. The results also illustrate that Codd's concept of data independence makes as much sense for heterogeneous, nested data sets as it does on highly structured tables.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Rumble

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Similar Papers

Challenges and opportunities of generative models on tabular data
Alex X Wang ... Binh P Nguyen
Applied Soft Computing | VOL. 166
Alex X Wang, et. al.Alex X Wang ... Binh P Nguyen
07 Sep 2024
Applied Soft Computing | VOL. 166

A virtual environment for the exploration of diffusion and flow phenomena in complex geometries
Robert G Belleman ... Peter M.A Sloot
Future Generation Computer Systems | VOL. 14
Robert G Belleman, et. al.Robert G Belleman ... Peter M.A Sloot
01 Aug 1998
Future Generation Computer Systems | VOL. 14

Deep Learning and Lung Cancer: AI to Extract Information Hidden in Routine CT Scans.
Kitt Shaffer
Radiology | VOL. 296
Kitt ShafferKitt Shaffer
12 May 2020
Radiology | VOL. 296

Water quality assessment and source identification of the Shuangji River (China) using multivariate statistical methods
Ruxue Liu ... Hongbin Xu
-
Ruxue Liu, et. al.Ruxue Liu ... Hongbin Xu
22 Jan 2021
22 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Rumble

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment