Abstract
The Minimum Mutual Information (MinMI) Principle provides the least committed, maximum-joint-entropy (ME) inferential law that is compatible with prescribed marginal distributions and empirical cross constraints. Here, we estimate MI bounds (the MinMI values) generated by constraining sets Tcr comprehended by mcr linear and/or nonlinear joint expectations, computed from samples of N iid outcomes. Marginals (and their entropy) are imposed by single morphisms of the original random variables. N-asymptotic formulas are given both for the distribution of cross expectation’s estimation errors, the MinMI estimation bias, its variance and distribution. A growing Tcr leads to an increasing MinMI, converging eventually to the total MI. Under N-sized samples, the MinMI increment relative to two encapsulated sets Tcr1 ⊂ Tcr2 (with numbers of constraints mcr1<mcr2 ) is the test-difference δH = Hmax 1, N - Hmax 2, N ≥ 0 between the two respective estimated MEs. Asymptotically, δH follows a Chi-Squared distribution 1/2NΧ2 (mcr2-mcr1) whose upper quantiles determine if constraints in Tcr2/Tcr1 explain significant extra MI. As an example, we have set marginals to being normally distributed (Gaussian) and have built a sequence of MI bounds, associated to successive non-linear correlations due to joint non-Gaussianity. Noting that in real-world situations available sample sizes can be rather low, the relationship between MinMI bias, probability density over-fitting and outliers is put in evidence for under-sampled data.
Highlights
This paper addresses the problem of estimating the MI conveyed by the least committed, inferential law (say the conditional probability density function pdf (Y | X ) between random variables RVs Y, X ), which is compatible with prescribed marginal distributions and a set Tcr of mcr empirical non-redundant cross constraints
This paper presents theoretical formulas for statistics of estimation errors of information theoretical measures
This is quite relevant because finite samples can apparently exhibit artificial statistical structures leading to negatively biased estimations of Entropy or positively biased estimations of Mutual Information
Summary
The seminal work of Shannon on Information Theory [1] gave rise to the concept of Mutual. Where H max,N is the ME estimation issued from N-sized samples of iid outcomes Those errors are roughly similar to those of MI and entropy generic estimator’s errors (see [13,14] for a thorough review and performance comparisons between MI estimators). Their mean (bias), variance and higher-order moments are written in terms of N 1 powers, covering intermediate and asymptotic N ranges [15], with specific applications in neurophysiology [16,17,18]. Estimators (e.g., kernel density estimators, adaptive or non-adaptive grids, nearest neighbors) and others specially designed for small samples [21,22]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have