DIFF: a relational interface for large-scale data explanation

Firas Abuzaid,Erik Meijer,Matei Zaharia,Asvin Ananthanarayan,Eric Xu,John Sheu,Xi Wu,Peter Kraft,Edward Gan,Peter Bailis,Sahaana Suri,Atul Shenoy,Jeff Naughton

doi:10.1007/s00778-020-00633-6

Abstract

A range of explanation engines assist data analysts by performing feature selection over increasingly high-volume and high-dimensional data, grouping and highlighting commonalities among data points. While useful in diverse tasks such as user behavior analytics, operational event processing, and root-cause analysis, today’s explanation engines are designed as stand-alone data processing tools that do not interoperate with traditional, SQL-based analytics workflows; this limits the applicability and extensibility of these engines. In response, we propose the DIFF operator, a relational aggregation operator that unifies the core functionality of these engines with declarative relational query processing. We implement both single-node and distributed versions of the DIFF operator in MB SQL, an extension of MacroBase, and demonstrate how DIFF can provide the same semantics as existing explanation engines while capturing a broad set of production use cases in industry, including at Microsoft and Facebook. Additionally, we illustrate how this declarative approach to data explanation enables new logical and physical query optimizations. We evaluate these optimizations on several real-world production applications and find that DIFF in MB SQL can outperform state-of-the-art engines by up to an order of magnitude.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DIFF: a relational interface for large-scale data explanation

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal

Lead the way for us

Journal: The VLDB Journal	Publication Date: Sep 30, 2020
Citations: 10

Similar Papers

DIFF
Firas Abuzaid ... Sahaana Suri
Proceedings of the VLDB Endowment | VOL. 12
Firas Abuzaid, et. al.Firas Abuzaid ... Sahaana Suri
01 Dec 2018
Proceedings of the VLDB Endowment | VOL. 12

A Survey on Blockchain for Healthcare Informatics and Applications
Kimberly Wilber ... Yaser Jararweh
-
Kimberly Wilber, et. al.Kimberly Wilber ... Yaser Jararweh
14 Dec 2020
14 Dec 2020

Practical Insights On Augmented Reality Support for Shop-Floor Tasks
Philipp Url ... Johannes Gasser
Procedia Manufacturing | VOL. 39
Philipp Url, et. al.Philipp Url ... Johannes Gasser
01 Jan 2019
Procedia Manufacturing | VOL. 39

RoCKIn@Work: Industrial Robot Challenge
Rainer Bischoff ... Nico Hochgeschwender
-
Rainer Bischoff, et. al.Rainer Bischoff ... Nico Hochgeschwender
09 Aug 2017
09 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DIFF: a relational interface for large-scale data explanation

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal