Abstract

We generalize the Jensen-Shannon divergence and the Jensen-Shannon diversity index by considering a variational definition with respect to a generic mean, thereby extending the notion of Sibson’s information radius. The variational definition applies to any arbitrary distance and yields a new way to define a Jensen-Shannon symmetrization of distances. When the variational optimization is further constrained to belong to prescribed families of probability measures, we get relative Jensen-Shannon divergences and their equivalent Jensen-Shannon symmetrizations of distances that generalize the concept of information projections. Finally, we touch upon applications of these variational Jensen-Shannon divergences and diversity indices to clustering and quantization tasks of probability measures, including statistical mixtures.

Highlights

  • Background and MotivationsThe goal of the author is to methodologically contribute to an extension of the Sibson’s information radius [1] and concentrate on analysis of the specified families of distributions called exponential families [2].Let (X, F ) denote a measurable space [3] with sample space X and σ-algebra F on the set X

  • In Theorem 2, a closed-form formula for calculating the information radius of order α between two densities of an exponential family when α1 is an integer

  • When p belongs to an exponential family P (P may be different from Q) with cumulant function FP, sufficient statistics tP ( x ), auxiliary carrier term k P ( x ), and natural parameter θ, we have the entropy [61] expressed, as follows: h[ p]

Read more

Summary

Background and Motivations

The goal of the author is to methodologically contribute to an extension of the Sibson’s information radius [1] and concentrate on analysis of the specified families of distributions called exponential families [2]. Sibson (Robin Sibson (1944–2017) is renown for inventing the natural neighbour interpolation [40]) [1] considered both the Rényi α-divergence [33] DαR and the Rényi αweighted mean MαR := Mgα to define the information radius Rα of order α of a weighted set P = {(wi , pi )}in=1 of densities pi ’s as the following minimization problem: Rα (P ) := min Rα (P , c),. The optimal density c∗α = arg minc∈D Rα (P , c) yielding the information radius Rα (P ) can be interpreted as a generalized centroid (extending the notion of Fréchet means [48]) with respect to ( MαR , DαR ), where a ( M, D )-centroid is defined by: Definition 1 (( M, D )-centroid). The information radius of order α of a weighted set of distributions is upper bounded by the discrete Rényi entropy of order α1 of the weight distribution: Rα (P ) ≤ H R1 [w], where HαR [w] := 1−1 α log ∑i wiα

JS-Symmetrization of Distances Based on Generalized Information Radius
Relative Information Radius
Relative Jensen-Shannon Divergences
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call