Abstract
A query Q is boundedly evaluable under a set A of access constraints if for all datasets D that satisfy A, there exists a fraction DQ of D such that Q(D) = Q(DQ), and the size of DQ and time for identifying DQ are both independent of the size of D. That is, we can compute Q(D) by accessing a bounded amount of data no matter how big D grows. However, while desirable, it is undecidable to determine whether a query in relational algebra (RA) is bounded under A. In light of the undecidability, this paper develops an effective syntax for bounded RA queries. We identify a class of covered RA queries such that under A, (a) every boundedly evaluable RA query is equivalent to a covered query, (b) every covered RA query is boundedly evaluable, and (c) it takes PTIME in |Q| and |A| to check whether Q is covered by A. We provide quadratic-time algorithms to check the coverage of Q, and to generate a bounded query plan for covered Q. We also study a new optimization problem for minimizing access constraints for covered queries. Using real-life data, we experimentally verify that a large number of RA queries in practice are covered, and that bounded query plans improve RA query evaluation by orders of magnitude.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have