Abstract

Sequential quantile estimation refers to incorporating observations into quantile estimates in an incremental fashion thus furnishing an online estimate of one or more quantiles at any given point in time. Sequential quantile estimation is also known as online quantile estimation. This area is relevant to the analysis of data streams and to the one-pass analysis of massive data sets. Applications include network traffic and latency analysis, real time fraud detection and high frequency trading. We introduce new techniques for online quantile estimation based on Hermite series estimators in the settings of static quantile estimation and dynamic quantile estimation. In the static quantile estimation setting we apply the existing Gauss-Hermite expansion in a novel manner. In particular, we exploit the fact that Gauss-Hermite coefficients can be updated in a sequential manner. To treat dynamic quantile estimation we introduce a novel expansion with an exponentially weighted estimator for the Gauss-Hermite coefficients which we term the Exponentially Weighted Gauss-Hermite (EWGH) expansion. These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time. In doing so we provide a solution to online distribution function and online quantile function estimation on data streams. In particular we derive an analytical expression for the CDF and prove consistency results for the CDF under certain conditions. In addition we analyse the associated quantile estimator. Simulation studies and tests on real data reveal the Gauss-Hermite based algorithms to be competitive with a leading existing algorithm.

Highlights

  • Algorithms for elucidating the statistical properties of streams of data in real time and for the efficient one-pass analysis of massive data sets are becoming increasingly pertinent

  • In this article we propose new distribution function and quantile estimators based on Hermite series expansions and study their properties

  • In this article we have defined a cumulative distribution function estimator based on Hermite series estimators which allows quantiles to be obtained numerically

Read more

Summary

Introduction

Algorithms for elucidating the statistical properties of streams of data in real time and for the efficient one-pass analysis of massive data sets are becoming increasingly pertinent. Both the EDF and kernel distribution function estimator only allow sequential estimation of the cumulative probability at a set of fixed x values (see chapters 4 and 5 of [13] and chapter 7 of [8] for a discussion of recursive kernel estimators) For quantile estimation, both the sample quantile estimator and L-estimators such as the kernel quantile estimator and the Bernstein-Durrmeyer estimator require the storage and updating of one or more order statistics (a sorted sequence of all observations seen so far). In this article we propose new techniques based on Hermite series estimators to maintain an online estimate of the CDF and the full quantile function in both the static and dynamic settings and yield estimates of the cumulative probability at arbitrary x and estimates of arbitrary quantiles that can be updated in constant time (O(1) time) This is the primary advantage of our suggested approach. Useful MISE results for the exponentially weighted Gauss-Hermite expansion are derived in appendix B

Hermite polynomials
Gauss-Hermite expansion
Truncated Gauss-Hermite expansions and nonparametric density estimation
Cumulative distribution function
Inverse cumulative distribution function
Online quantile estimation
Selection of N
Selection of the parameters λ and N
Quality of CDF and quantile estimates
Quality of the cumulative distribution function estimate
Quality of quantile estimate
Simulation results
IID data
The chi-squared distribution
The exponential distribution
Non-identically distributed data
Normal distribution with drift
Exponential distribution with drift
Real data results
Conclusion
Static quantile estimation
Findings
MISE of the exponentially weighted Gauss-Hermite expansion for IID data
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call