Abstract

Incomplete data has been a longstanding issue in database community, and yet the subject is poorly handled by both theory and practice. In this paper, we propose to directly estimate the aggregate query result on incomplete data, rather than imputing the missing values. An interval estimation, composed of the upper and lower bound of aggregate query results among all possible interpretation of missing values, are presented to the end-users. The ground-truth aggregate result is guaranteed to be among the interval. Experimental results are consistent with the theoretical results, and suggest that the estimation is invaluable to better assess the results of aggregate queries on incomplete data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call