Abstract

Probabilistic databases address the requirements of applications that produce large collections of uncertain data. They should provide declarative means to control the integrity of data. Cardinality constraints, in particular, control the occurrences of data patterns by declaring in how many records a combination of data values can occur. We propose cardinality constraints on probabilistic data, which stipulate lower bounds on the marginal probability by which a cardinality constraint holds. We investigate limits and opportunities for automating their use in integrity control. This includes hardness results for their validation, axiomatic and efficient algorithmic characterisations of their implication problem, and an algorithm that computes succinct semantic summaries for any collection of these constraints. Experiments complement our theoretical analysis on the time and space complexity of computing semantic summaries, suggesting that their computation provides the basis to acquire meaningful constraints. We also establish evidence that probabilistic functional and inclusion dependencies cannot be managed as simply as probabilistic cardinality constraints.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call