BUNNI: Learning Repair Actions in Rule-driven Data Cleaning

Giansalvatore Mecca,Paolo Papotti,Donatello Santoro,Enzo Veltri

doi:10.1145/3665930

Abstract

In this work, we address the challenging and open problem of involving non-expert users in the data-repairing problem as first-class citizens. Despite a large number of proposals that have been devoted to cleaning data from the point of view of expert users (IT staff and data scientists), there is a lack of studies from the perspective of non-expert ones. Given a set of available data quality rules, we exploit machine learning techniques to guide the user to identify the dirty values for each violation and repair them. We show that with a low user effort, it is possible to identify the values in tuples that can be trusted and the ones that are most likely errors. We show experimentally how this machine-learning approach leads to a unique clean solution with high quality in scenarios where other approaches fail.

Full Text