Fast multivariate empirical cumulative distribution function with connection to kernel density estimation

Nicolas Langrené,Xavier Warin

doi:10.1016/j.csda.2021.107267

Abstract

The problem of computing empirical cumulative distribution functions (ECDF) efficiently on large, multivariate datasets, is revisited. Computing an ECDF at one evaluation point requires O(N) operations on a dataset composed of N data points. Therefore, a direct evaluation of ECDFs at N evaluation points requires a quadratic O(N2) operations, which is prohibitive for large-scale problems. Two fast and exact methods are proposed and compared. The first one is based on fast summation in lexicographical order, with a O(Nlog⁡N) complexity and requires the evaluation points to lie on a regular grid. The second one is based on the divide-and-conquer principle, with a O(Nlog⁡(N)(d−1)∨1) complexity and requires the evaluation points to coincide with the input points. The two fast algorithms are described and detailed in the general d-dimensional case, and numerical experiments validate their speed and accuracy. Secondly, a direct connection between cumulative distribution functions and kernel density estimation (KDE) is established for a large class of kernels. This connection paves the way for fast exact algorithms for multivariate kernel density estimation and kernel regression. Numerical tests with the Laplacian kernel validate the speed and accuracy of the proposed algorithms. A broad range of large-scale multivariate density estimation, cumulative distribution estimation, survival function estimation and regression problems can benefit from the proposed numerical methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Statistics & Data Analysis	Publication Date: May 13, 2021
Citations: 14	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Fast multivariate empirical cumulative distribution function with connection to kernel density estimation

Abstract

Talk to us

Similar Papers

More From: Computational Statistics & Data Analysis

Lead the way for us

Similar Papers

Geomathematics, mathematical background and geo-science applications: F.P. Agterberg, 1974. Developments in Geomathematics, Vol. 1. Elsevier, Amsterdam, 596 pp., Dfl. 135.00
J Harbaugh
Earth Science Reviews | VOL. 11
J HarbaughJ Harbaugh
01 Mar 1975
Earth Science Reviews | VOL. 11

Fast and Stable Multivariate Kernel Density Estimation by Fast Sum Updating
Nicolas Langrené ... Xavier Warin
Journal of Computational and Graphical Statistics | VOL. 28
Nicolas Langrené, et. al.Nicolas Langrené ... Xavier Warin
13 Feb 2019
Journal of Computational and Graphical Statistics | VOL. 28

Assessing Goodness of Fit to a Gamma Distribution and Estimating Future Projection on Daily Precipitation Frequency Using Regional Climate Model Simulations over Japan with and without the Influence of Tropical Cyclones
Akihiko Murata ... Shun-Ichi I Watanabe
Journal of Hydrometeorology | VOL. 21
Akihiko Murata, et. al.Akihiko Murata ... Shun-Ichi I Watanabe
28 Oct 2020
Journal of Hydrometeorology | VOL. 21

Chapter 9 - Multivariate density estimation
Dag Tjøstheim ... Håkon Otneim
Statistical Modeling using Local Gaussian Approximation | VOL. -
Dag Tjøstheim, et. al.Dag Tjøstheim ... Håkon Otneim
01 Jan 2021
Statistical Modeling using Local Gaussian Approximation | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast multivariate empirical cumulative distribution function with connection to kernel density estimation

Abstract

Talk to us

Similar Papers

More From: Computational Statistics &amp; Data Analysis

More From: Computational Statistics & Data Analysis