Geometric k-nearest neighbor estimation of entropy and mutual information.

Warren M Lord,Erik M Bollt,Jie Sun

doi:10.1063/1.5011683

Warren M Lord, Erik M Bollt + Show 1 more

Open Access

https://doi.org/10.1063/1.5011683

Copy DOI

Journal: Chaos: An Interdisciplinary Journal of Nonlinear Science	Publication Date: Mar 1, 2018
Citations: 25	License type: publisher-specific, author manuscript

Affiliation: Clarkson University

Abstract

Nonparametric estimation of mutual information is used in a wide range of scientific problems to quantify dependence between variables. The k-nearest neighbor (knn) methods are consistent, and therefore expected to work well for a large sample size. These methods use geometrically regular local volume elements. This practice allows maximum localization of the volume elements, but can also induce a bias due to a poor description of the local geometry of the underlying probability measure. We introduce a new class of knn estimators that we call geometric knn estimators (g-knn), which use more complex local volume elements to better model the local geometry of the probability measures. As an example of this class of estimators, we develop a g-knn estimator of entropy and mutual information based on elliptical volume elements, capturing the local stretching and compression common to a wide range of dynamical system attractors. A series of numerical examples in which the thickness of the underlying distribution and the sample sizes are varied suggest that local geometry is a source of problems for knn methods such as the Kraskov-Stögbauer-Grassberger estimator when local geometric effects cannot be removed by global preprocessing of the data. The g-knn method performs well despite the manipulation of the local geometry. In addition, the examples suggest that the g-knn estimators can be of particular relevance to applications in which the system is large, but the data size is limited.

Full Text