Continuous data cleaning

Maksims Volkovs,Jaroslaw Szlichta,Fei Chiang Fei Chiang,Renee J Miller

doi:10.1109/icde.2014.6816655

Abstract

In declarative data cleaning, data semantics are encoded as constraints and errors arise when the data violates the constraints. Various forms of statistical and logical inference can be used to reason about and repair inconsistencies (errors) in data. Recently, unified approaches that repair both errors in data and errors in semantics (the constraints) have been proposed. However, both data-only approaches and unified approaches are by and large static in that they apply cleaning to a single snapshot of the data and constraints. We introduce a continuous data cleaning framework that can be applied to dynamic data and constraint environments. Our approach permits both the data and its semantics to evolve and suggests repairs based on the accumulated evidence to date. Importantly, our approach uses not only the data and constraints as evidence, but also considers the past repairs chosen and applied by a user (user repair preferences). We introduce a repair classifier that predicts the type of repair needed to resolve an inconsistency, and that learns from past user repair preferences to recommend more accurate repairs in the future. Our evaluation shows that our techniques achieve high prediction accuracy and generate high quality repairs. Of independent interest, our work makes use of a set of data statistics that are shown to be sensitive to predicting particular repair types.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Continuous data cleaning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

GEOMETRIC QUALITY ASSESSMENT OF LIDAR DATA BASED ON SWATH OVERLAP
A Sampath ... H K Heidemann
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B1
A Sampath, et. al.A Sampath ... H K Heidemann
02 Jun 2016
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B1

GEOMETRIC QUALITY ASSESSMENT OF LIDAR DATA BASED ON SWATH OVERLAP
A Sampath ... H K Heidemann
ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B1
A Sampath, et. al.A Sampath ... H K Heidemann
02 Jun 2016
ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B1

Learning programming from erroneous worked-examples. Which type of error is beneficial for learning?
Maik Beege ... Günter Daniel Rey
Learning and Instruction | VOL. 75
Maik Beege, et. al.Maik Beege ... Günter Daniel Rey
30 May 2021
Learning and Instruction | VOL. 75

Estimation of Errors in Gene Expression Data Introduced by Diffractive Blurring of Confocal Images
Ekaterina Myasnikova ... Maria Samsonova
-
Ekaterina Myasnikova, et. al.Ekaterina Myasnikova ... Maria Samsonova
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Continuous data cleaning

Abstract

Talk to us

Similar Papers