Detecting Maximum Inclusion Dependencies without Candidate Generation

Nuhad Shaabani,Christoph Meinel

doi:10.1007/978-3-319-44406-2_10

Abstract

Inclusion dependencies (INDs) within and across databases are an important relationship for many applications in data integration, schema (re-)design, integrity checking, or query optimization. Existing techniques for detecting all INDs need to generate IND candidates and test their validity in the given data instance. However, the major disadvantage of this approach is the exponentially growing number of data accesses in terms of the number of SQL queries as well as I/O operations. We introduce Mind \(_2\), a new approach for detecting n-ary INDs (\(n > 1\)) without any candidate generation. Mind \(_2\) implements a new characterization of the maximum INDs we developed in this paper. This characterization is based on set operations defined on certain metadata that Mind \(_2\) generates by accessing the database only 2 \(\times \) the number of valid unary INDs. Thus, Mind \(_2\) eliminates the exponential number of data accesses needed by existing approaches. Furthermore, the experiments show that Mind \(_2\) is significantly more scalable than hypergraph-based approaches.

Full Text