A Closer Look at Variance Implementations In Modern Database Systems

Niranjan Kamat,Arnab Nandi

doi:10.1145/3092931.3092936

Abstract

Variance is a popular and often necessary component of aggregation queries. It is typically used as a secondary measure to ascertain statistical properties of the result such as its error. Yet, it is more expensive to compute than primary measures such as SUM, MEAN, and COUNT. There exist numerous techniques to compute variance. While the definition of variance implies two passes overthe data, other mathematical formulations lead to a singlepass computation. Some single-pass formulations, however, can suffer from severe precision loss, especially for large datasets. In this paper, we study variance implementations in various real-world systems and find that major database systems such as PostgreSQL and most likely System X, a major commercial closed-source database, use a representation that is efficient, but suffers from floating point precision loss resulting from catastrophic cancellation. We review literature over the past five decades on variance calculation in both the statistics and database communities, and summarize recommendations on implementing variance functions in various settings, such as approximate query processing and large-scale distributed aggregation. Interestingly, we recommend using the mathematical formula for computing variance if two passes over the data are acceptable due to its precision, parallelizability, and surprisingly computation speed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Closer Look at Variance Implementations In Modern Database Systems

Abstract

Talk to us

Similar Papers

More From: ACM SIGMOD Record

Lead the way for us

Journal: ACM SIGMOD Record	Publication Date: May 11, 2017
Citations: 12

Similar Papers

AQP++: A Hybrid Approximate Query Processing Framework for Generalized Aggregation Queries
Yuxiang Wang ... Qiming Fang
-
Yuxiang Wang, et. al.Yuxiang Wang ... Qiming Fang
01 Aug 2016
01 Aug 2016

Sample + Seek
Bolin Ding ... Silu Huang
-
Bolin Ding, et. al.Bolin Ding ... Silu Huang
14 Jun 2016
14 Jun 2016

AQP++: a hybrid approximate query processing framework for generalized aggregation queries
Yuxiang Wang ... Xiaoliang Xu
Journal of Computational Science | VOL. 26
Yuxiang Wang, et. al.Yuxiang Wang ... Xiaoliang Xu
29 May 2017
Journal of Computational Science | VOL. 26

AQP++
Jinglin Peng ... Dongxiang Zhang
-
Jinglin Peng, et. al.Jinglin Peng ... Dongxiang Zhang
27 May 2018
27 May 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Closer Look at Variance Implementations In Modern Database Systems

Abstract

Talk to us

Similar Papers

More From: ACM SIGMOD Record