Abstract
We generalize the Jensen-Shannon divergence and the Jensen-Shannon diversity index by considering a variational definition with respect to a generic mean, thereby extending the notion of Sibson’s information radius. The variational definition applies to any arbitrary distance and yields a new way to define a Jensen-Shannon symmetrization of distances. When the variational optimization is further constrained to belong to prescribed families of probability measures, we get relative Jensen-Shannon divergences and their equivalent Jensen-Shannon symmetrizations of distances that generalize the concept of information projections. Finally, we touch upon applications of these variational Jensen-Shannon divergences and diversity indices to clustering and quantization tasks of probability measures, including statistical mixtures.
Highlights
Background and MotivationsThe goal of the author is to methodologically contribute to an extension of the Sibson’s information radius [1] and concentrate on analysis of the specified families of distributions called exponential families [2].Let (X, F ) denote a measurable space [3] with sample space X and σ-algebra F on the set X
In Theorem 2, a closed-form formula for calculating the information radius of order α between two densities of an exponential family when α1 is an integer
When p belongs to an exponential family P (P may be different from Q) with cumulant function FP, sufficient statistics tP ( x ), auxiliary carrier term k P ( x ), and natural parameter θ, we have the entropy [61] expressed, as follows: h[ p]
Summary
The goal of the author is to methodologically contribute to an extension of the Sibson’s information radius [1] and concentrate on analysis of the specified families of distributions called exponential families [2]. Sibson (Robin Sibson (1944–2017) is renown for inventing the natural neighbour interpolation [40]) [1] considered both the Rényi α-divergence [33] DαR and the Rényi αweighted mean MαR := Mgα to define the information radius Rα of order α of a weighted set P = {(wi , pi )}in=1 of densities pi ’s as the following minimization problem: Rα (P ) := min Rα (P , c),. The optimal density c∗α = arg minc∈D Rα (P , c) yielding the information radius Rα (P ) can be interpreted as a generalized centroid (extending the notion of Fréchet means [48]) with respect to ( MαR , DαR ), where a ( M, D )-centroid is defined by: Definition 1 (( M, D )-centroid). The information radius of order α of a weighted set of distributions is upper bounded by the discrete Rényi entropy of order α1 of the weight distribution: Rα (P ) ≤ H R1 [w], where HαR [w] := 1−1 α log ∑i wiα
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have