Data repairing algorithms are extensively studied for improving data quality. Denial constraints (DCs) are commonly employed to state quality specifications that data should satisfy and hence facilitate data repairing since DCs are general enough to subsume many other dependencies. Data in practice are usually frequently updated, which motivates the quest for efficient incremental repairing techniques in response to data updates. In this paper, we present the first incremental algorithm for repairing DC violations. Specifically, given a relational instance I consistent with a set Σ of DCs, and a set △I of tuple insertions to I, our aim is to find a set △I′ of tuple insertions such that Σ is satisfied on I+△I′. We first formalize and prove the complexity of the problem of incremental data repairing with DCs. We then present techniques that combine auxiliary indexing structures to efficiently identify DC violations incurred by △Iw.r.t.Σ, and further develop an efficient repairing algorithm to compute △I′ by resolving DC violations. Finally, using both real-life and synthetic datasets, we conduct extensive experiments to demonstrate the effectiveness and efficiency of our approach.
Read full abstract