Abstract

Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchia and Pigolotti (2011). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so.A method for practically extending the Bernacchia–Pigolotti KDE to multidimensions is introduced. This multidimensional extension is combined with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 105 samples only takes 1 s on a modern workstation. This fast and objective KDE method, called the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-science KDE methods publicly available in R, and it produces kernel density estimates several orders of magnitude faster. The fastKDE method does an excellent job of encoding covariance information for bivariate samples. This property allows for direct calculation of conditional PDFs with fastKDE. It is demonstrated how this capability might be leveraged for detecting non-trivial relationships between quantities in physical systems, such as transitional behavior.

Highlights

  • Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities

  • For finite N, the kernel effectively inherits its main shape and orientation from the underlying data. We demonstrate that these properties result in convergence even for probability density function (PDF) with non-trivial covariance structure by examining the convergence properties of the fastKDE method on two additional bivariate PDFs: a PDF with a relatively subtle transition in the relationship between two variables, and a complex mixture of normal distributions

  • We designed the fastKDE method with the intent that it be used as a general-purpose kernel density estimates (KDEs) tool

Read more

Summary

Introduction

Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. In a review of automatic selection methods, Heidenreich et al (2013) recommend a variety of different methods, depending on dataset characteristics (including sample size, distribution smoothness, and skewness) These drawbacks appear to have inhibited their widespread adoption as a fundamental data analysis tool. To get around this difficulty of a potentially subjective choice of kernel shape and kernel bandwidth, Bernacchia and Pigolotti (2011) derive a method for objectively determining both the kernel shape and the kernel bandwidth (Luedicke and Bernacchia, 2014 implemented it in Stata) They define a Fourier-based (and typically low-pass) filter on the empirical characteristic function (ECF) of a given dataset that yields an empirical kernel that is optimal in the sense that the integrated, squared difference between the resulting KDE and the true PDF is minimized. We apply this method to examine the distribution of precipitation and temperature in California, USA conditioned on global mean temperature (Section 4), which shows a number of intriguing features that prompt further investigation

The self-consistent KDE in arbitrary dimensions
Convergence on multidimensional normal distributions
Convergence on complex distributions
Comparison with existing bandwidth selection methods
Joint occurrence of temperature and precipitation
Findings
Discussion
Summary
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call