Abstract
In this paper, a dynamic setting for data quality improvement is studied. In such a setting, there is a repeated search for data quality rules and a fix of their violations until stability is reached. The constraints considered here are simple constant edit rules and searching is done via association analysis. Repair of violations relies on the set cover method. This paper contributes to the field of data quality in three ways. First, it is shown that with appropriate filtering, association analysis is an appealing tool to discover data quality rules with high precision. Second, when edit rules are limited to logical implications such as association rules, then under reasonable circumstances, time complexity of rule implication reduces from exponential to quadratic. This result is formalized as the strong generator theorem. Third, a detailed analysis of data repair in a dynamic setting is provided and the conditions for termination are shown. Empirical results indicate that if the initial precision of rules is high, then repeated search-and-repair offers a boost in recall with a mitigated drop in precision.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.