Divide &amp; conquer-based inclusion dependency discovery

Thorsten Papenbrock,Jorge-Arnulfo Quiané-Ruiz,Sebastian Kruse,Felix Naumann

doi:10.14778/2752939.2752946

Abstract

The discovery of all inclusion dependencies (INDs) in a dataset is an important part of any data profiling effort. Apart from the detection of foreign key relationships, INDs can help to perform data integration, query optimization, integrity checking, or schema (re-)design. However, the detection of INDs gets harder as datasets become larger in terms of number of tuples as well as attributes. To this end, we propose Binder, an IND detection system that is capable of detecting both unary and n-ary INDs. It is based on a divide & conquer approach, which allows to handle very large datasets -- an important property on the face of the ever increasing size of today's data. In contrast to most related works, we do not rely on existing database functionality nor assume that inspected datasets fit into main memory. This renders Binder an efficient and scalable competitor. Our exhaustive experimental evaluation shows the high superiority of Binder over the state-of-the-art in both unary (Spider) and n-ary (Mind) IND discovery. Binder is up to 26x faster than Spider and more than 2500x faster than Mind.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Divide & conquer-based inclusion dependency discovery

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Feb 1, 2015
Citations: 69

Similar Papers

Improving the Efficiency of Inclusion Dependency Detection
Nuhad Shaabani ... Christoph Meinel
-
Nuhad Shaabani, et. al.Nuhad Shaabani ... Christoph Meinel
17 Oct 2018
17 Oct 2018

On discovering and incrementally updating inclusion dependencies

-

01 Jan 2020
01 Jan 2020

Detecting Maximum Inclusion Dependencies without Candidate Generation
Nuhad Shaabani ... Christoph Meinel
-
Nuhad Shaabani, et. al.Nuhad Shaabani ... Christoph Meinel
01 Jan 2015
01 Jan 2015

Fast Accurate Discovery of Tuple Inclusion Dependencies
Mengfei Shen ... Kazuhiro Saito
-
Mengfei Shen, et. al.Mengfei Shen ... Kazuhiro Saito
01 Jun 2022
01 Jun 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Divide &amp; conquer-based inclusion dependency discovery

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Divide & conquer-based inclusion dependency discovery