Abstract

In this work, we address the challenging and open problem of involving non-expert users in the data-repairing problem as first-class citizens. Despite a large number of proposals that have been devoted to cleaning data from the point of view of expert users (IT staff and data scientists), there is a lack of studies from the perspective of non-expert ones. Given a set of available data quality rules, we exploit machine learning techniques to guide the user to identify the dirty values for each violation and repair them. We show that with a low user effort, it is possible to identify the values in tuples that can be trusted and the ones that are most likely errors. We show experimentally how this machine-learning approach leads to a unique clean solution with high quality in scenarios where other approaches fail.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call