Abstract

On a stream \({\fancyscript{S}}\) of two dimensional data items \((x,y)\) where \(x\) is an item identifier and \(y\) is a numerical attribute, a correlated aggregate query \(C(\sigma ,AGG,{\fancyscript{S}})\) asks to first apply a selection predicate \(\sigma \) along the \(y\) dimension, followed by an aggregation \(AGG\) along the \(x\) dimension. For selection predicates of the form \((y < c)\) or \((y > c)\), where parameter \(c\) is provided at query time, we present new streaming algorithms and lower bounds for estimating correlated aggregates. Our main result is a general method that reduces the estimation of a correlated aggregate \(AGG\) to the streaming computation of \(AGG\) over an entire stream, for an aggregate that satisfies certain conditions. This results in the first sublinear space algorithms for the correlated estimation of a large family of statistics, including frequency moments. Our experimental validation shows that the memory requirements of these algorithms are significantly smaller than existing linear storage solutions, and that these achieve a fast per-record processing time. We also study the setting when items have weights. In the case when weights can be negative, we give a strong space lower bound which holds even if the algorithm is allowed up to a logarithmic number of passes over the data. We complement this with a small space algorithm which uses a logarithmic number of passes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.