Abstract

A discrete system’s heterogeneity is measured by the Rényi heterogeneity family of indices (also known as Hill numbers or Hannah–Kay indices), whose units are the numbers equivalent. Unfortunately, numbers equivalent heterogeneity measures for non-categorical data require a priori (A) categorical partitioning and (B) pairwise distance measurement on the observable data space, thereby precluding application to problems with ill-defined categories or where semantically relevant features must be learned as abstractions from some data. We thus introduce representational Rényi heterogeneity (RRH), which transforms an observable domain onto a latent space upon which the Rényi heterogeneity is both tractable and semantically relevant. This method requires neither a priori binning nor definition of a distance function on the observable space. We show that RRH can generalize existing biodiversity and economic equality indices. Compared with existing indices on a beta-mixture distribution, we show that RRH responds more appropriately to changes in mixture component separation and weighting. Finally, we demonstrate the measurement of RRH in a set of natural images, with respect to abstract representations learned by a deep neural network. The RRH approach will further enable heterogeneity measurement in disciplines whose data do not easily conform to the assumptions of existing indices.

Highlights

  • Measuring heterogeneity is of broad scientific importance, such as in studies of biodiversity [1,2], resource concentration [3], and consistency of clinical trial results [4], to name a few

  • To highlight the generalizability of our approach to complex latent variable models, we provide an evaluation of representational Rényi heterogeneity (RRH) applied to the latent representations of a handwritten image dataset [22] learned by a variational autoencoder [23,24]

  • Compared to state-of-the-art comparator indices under a beta mixture distribution, RRH more reliably quantified the number of unique mixture components (Section 4.1), and under a deep generative model of image data, RRH was able to measure the effective number of distinct images with respect to latent continuous representations (Section 4.2)

Read more

Summary

Introduction

Measuring heterogeneity is of broad scientific importance, such as in studies of biodiversity (ecology and microbiology) [1,2], resource concentration (economics) [3], and consistency of clinical trial results (biostatistics) [4], to name a few. Mathematical ecologists have developed heterogeneity measures for non-categorical systems, which they generally call “functional diversity indices” [6,7,8,9,10,11]. These indices typically require a priori discretization and specification of a distance function on the observable space. Rényi heterogeneity and various approaches by which it has been generalized for application on non-categorical spaces [8,10,21] Limitations of these indices are highlighted, thereby motivating Section.

Rényi Heterogeneity in Categorical Systems
Decomposition of Categorical Rényi Heterogeneity
Limitations of Categorical Rényi Heterogeneity
Non-Categorical Heterogeneity Indices
Numbers Equivalent Quadratic Entropy
Functional Hill Numbers
Leinster–Cobbold Index
Limitations of Existing Non-Categorical Heterogeneity Indices
Representational Rényi Heterogeneity
Rényi Heterogeneity on Categorical Representations
Rényi Heterogeneity on Non-Categorical Representations
Empirical Applications of Representational Rényi Heterogeneity
Comparison of Heterogeneity Indices Under a Mixture of Beta Distributions
Representational Rényi Heterogeneity is Scalable to Deep Learning Models
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call