Abstract
Data completeness is becoming a significant roadblock in data quality. Existing research in this area currently handles the certainty of a query by ignoring the incomplete part and approximating missing attributes on partially complete tuples, but leaves open the question of how the missing data affect the quality of the results. This is particularly challenging when entire tuples are absent, which can affect query certainty in ways that are not immediately obvious. To aid this, we propose cyadb , a database that "covers your ask" by assessing the quality of a query answer when data are missing. cyadb is a human-in-the-loop system, in which the data owner utilizes his or her domain knowledge of data to specify aspects of the missing data, such as where it might be missing ("where"), how many data points are missing ("how many"), and how large the missing data points could be in comparison to the provided data ("how big"). Using this, cyadb calculates the query's missing sensitivity, the maximal size of the effect that the missing data could have on the given query. Additionally, cyadb provides concrete examples of missing data that match the missing sensitivity to help the user interactively refine the provided domain knowledge.
Paper version not known (
Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have