Abstract

This work addresses the problem of Shannon entropy estimation in countably infinite alphabets studying and adopting some recent convergence results of the entropy functional, which is known to be a discontinuous function in the space of probabilities in ∞-alphabets. Sufficient conditions for the convergence of the entropy are used in conjunction with some deviation inequalities (including scenarios with both finitely and infinitely supported assumptions on the target distribution). From this perspective, four plug-in histogram-based estimators are studied showing that convergence results are instrumental to derive new strong consistent estimators for the entropy. The main application of this methodology is a new data-driven partition (plug-in) estimator. This scheme uses the data to restrict the support where the distribution is estimated by finding an optimal balance between estimation and approximation errors. The proposed scheme offers a consistent (distribution-free) estimator of the entropy in ∞-alphabets and optimal rates of convergence under certain regularity conditions on the problem (finite and unknown supported assumptions and tail bounded conditions on the target distribution).

Highlights

  • Shannon entropy estimation has a long history in information theory, statistics, and computer science [1]

  • In view of the discontinuity of the entropy in ∞-alphabets [24] and the results that guarantee entropy convergence [25,26,27,31], this work revisits the problem of point-wise almost-sure entropy estimation in ∞-alphabets from the perspective of studying and applying entropy convergence results and their derived bounds [25,26,31]

  • Our main conjecture is that putting these conditions in the context of a learning task, i.e., where {μn : n ≥ 0} is a random sequence of distributions driven by the classical empirical process, will offer the possibility to study a broad family of plug-in estimators with the objective to derive new strong consistency and rates of convergence results

Read more

Summary

Introduction

Shannon entropy estimation has a long history in information theory, statistics, and computer science [1]. Entropy and related information measures (conditional entropy and mutual information) have a fundamental role in information theory and statistics [2,3] and, as a consequence, it has found numerous applications in learning and decision making tasks [4,5,6,7,8,9,10,11,12,13,14,15] In many of these contexts, distributions are not available and the entropy needs to be estimated from empirical data. More recent research has focused on looking at the so-called large alphabet (or large dimensional) regime, meaning a non-asymptotic under-sampling regime where the number of samples n is on the order of, or even smaller than, the size of the alphabet denoted by k In this context, it has been shown that the classical plug-in estimator is sub-optimal as it suffers from severe bias [17,18]. These findings are consistent with the observation that the entropy is a continuous functional of the space of distributions (in the total variational distance sense) for the finite alphabet case [2,23,24,25]

The Challenging Infinite Alphabet Learning Scenario
From Convergence Results to Entropy Estimation
Contributions
Organization
Preliminaries
Convergence Results for the Shannon Entropy
Shannon Entropy Estimation
The Barron-Györfi-van der Meulen Estimator
A Data-Driven Histogram-Based Estimator
Discussion of the Results and Final Remarks
Proof of the Main Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call