Robust Estimation With Sampling and Approximate Pre-Aggregation

Christopher Jermaine

doi:10.1016/b978-012722442-8/50083-5

Abstract

This chapter considers the problem of approximation of aggregate functions over categorical data, or mixed categorical/numerical data. It proposes a method based upon random sampling, called approximate pre-aggregation (APA), a framework for using simple summary statistics to greatly increase the accuracy of random sampling for estimation of aggregate queries over categorical or mixed categorical/numerical data. This is important because many previous estimation techniques have largely ignored categorical data. APA is based upon sound, statistical techniques such as maximum likelihood estimation and constrained quadratic programming. It is also suitable for estimation in a streaming environment, since the information used by APA can be collected in a single database scan. The biggest drawback of sampling for aggregate function estimating is the sensitivity of sampling to attribute value skew, and APA uses several techniques to overcome this sensitivity. The increase in accuracy using APA compared to “plain vanilla” sampling is dramatic. For SUM and AVG queries, the relative error for random sampling alone is more than 700% greater than for sampling with APA. Even if stratified sampling techniques are used, the error is still between 28% and 175% greater than for APA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robust Estimation With Sampling and Approximate Pre-Aggregation

Abstract

Talk to us

Similar Papers

More From: Proceedings 2003 VLDB Conference

Lead the way for us

Journal: Proceedings 2003 VLDB Conference	Publication Date: Jan 1, 2003
Citations: 42

Similar Papers

Confirmatory Factor Analyses in Psychological Test Adaptation and Development
Kay Brauer ... Matthias Ziegler
Psychological Test Adaptation and Development | VOL. 4
Kay Brauer, et. al.Kay Brauer ... Matthias Ziegler
01 Feb 2023
Psychological Test Adaptation and Development | VOL. 4

A novel approach to discover similar temporal association patterns in a single database scan
V Radhakrishna ... V Janaki
-
V Radhakrishna, et. al.V Radhakrishna ... V Janaki
01 Dec 2015
01 Dec 2015

Nonparametric regression estimator of multivariable Fourier Series for categorical data
Muhammad Zulfadhli ... Vita Ratnasari
MethodsX | VOL. 13
Muhammad Zulfadhli, et. al.Muhammad Zulfadhli ... Vita Ratnasari
05 Oct 2024
MethodsX | VOL. 13

The asymptotic distribution of robust maximum likelihood estimator with Huber function for the mixed spatial autoregressive model with outliers
Zhen Yang ... Jiming Jiang
Communications in Statistics - Theory and Methods | VOL. ahead-of-print
Zhen Yang, et. al.Zhen Yang ... Jiming Jiang
10 Jan 2022
Communications in Statistics - Theory and Methods | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust Estimation With Sampling and Approximate Pre-Aggregation

Abstract

Talk to us

Similar Papers

More From: Proceedings 2003 VLDB Conference