Abstract

Integrity constraints such as functional dependencies (FD) and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Then, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Finally, we show how some of the results in the paper can be derived using the I-measure theory, which relates between information theoretic measures and set theory. Our results recover, and sometimes extend, previously known results about the implication problem: the implication of MVDs and FDs can be checked by considering only 2-tuple relations.

Highlights

  • Integrity constraints are assertions about a database that are stated by the database administrator and enforced by the system during updates

  • In this paper we consider a new problem, called the relaxation problem: if an exact implication holds, does an approximate implication hold too? For example, suppose we prove that a given set of functional dependencies (FD) implies another FD, but the input data satisfies the antecedent FDs only to some degree: to what degree does the consequent FD hold on the database? An approximate implication (AI) is an inequality that bounds the consequent by a linear combination of the antecedents

  • When the consequent is an FD, we show that implication admits a 1-relaxation

Read more

Summary

Introduction

Integrity constraints are assertions about a database that are stated by the database administrator and enforced by the system during updates. Problem (FIS) [22, 5], or as measure based constraints [32] in applications like Dempster-Shafer theory, possibilistic theory, and game theory (see discussion in [32]). In all these applications, quite often the constraints are learned from the data, and are not required to hold exactly, but it suffices if they hold only to a certain degree. The classical implication problem asks whether a set of constraints, called the antecedents, logically imply another constraint called the consequent.

Results
Integrity Constraints and Conditional Independence
Background on Information Theory
Discussion
Definition of the Relaxation Problem
Relaxation for FDs and MVDs
Proof of Theorem 9
Proof of Theorem 6 Item 2
Relaxation for General CIs
Restricted Axioms
Restricted Models
Discussion and Future
A Example for Section 5
B Proof of Cone Properties and Identities from Section 5
C Proof of Lemma 22
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call