Thermodynamic inference of data manifolds

Purushottam D Dixit

doi:10.1103/physrevresearch.2.023201

Abstract

The Gibbs-Boltzmann distribution offers a physically interpretable way to massively reduce the dimensionality of high dimensional probability distributions where the extensive variables are `features' and the intensive variables are `descriptors'. However, not all probability distributions can be modeled using the Gibbs-Boltzmann form. Here, we present TMI: TMI, {\bf T}hermodynamic {\bf M}anifold {\bf I}nference; a thermodynamic approach to approximate a collection of arbitrary distributions. TMI simultaneously learns from data intensive and extensive variables and achieves dimensionality reduction through a multiplicative, positive valued, and interpretable decomposition of the data. Importantly, the reduced dimensional space of intensive parameters is not homogeneous. The Gibbs-Boltzmann distribution defines an analytically tractable Riemannian metric on the space of intensive variables allowing us to calculate geodesics and volume elements. We discuss the applications of TMI with multiple real and artificial data sets. Possible extensions are discussed as well.

Highlights

Over the past few years, our ability to collect high dimensional data has improved substantially
Differentiating with respect to the intensive and the extensive variables and setting the derivative to zero, we find that the intensive variables are fixed points of nonlinear equations qa(α)Yka =
We can ensure that the extensive variables Yka and Ykb corresponding to neighboring states a and b are similar to each other by introducing regularizing constraints: nab(Yka − Ykb)2 < Ck ∀ k, (9)

Summary

INTRODUCTION

Over the past few years, our ability to collect high dimensional data has improved substantially. The high dimensional data (in the form of a matrix) are expressed as a product of two or more simpler (for example, sparse or low rank) matrices In contrast methods such as diffusion maps [3], Laplacian Eigenmaps [4], Isomaps [5], tSNE (t-stochastic neighborhood embedding) [6], and UMAP (uniform manifold approximation and projection) [7] are based on manifold learning. Motivating the Gibbs-Boltzmann distribution using the maximum entropy principle [8,9] has allowed us to employ it to model probabilities in a variety of complex systems such as ensembles of protein sequences [10], parameters of signaling networks [11,12], collective firing of neurons [13], and collective motions of birds [14] This approach approach has been used to approximate dynamics of chemical reaction networks [15,16].

TMI APPROXIMATES ARBITRARY DISTRIBUTIONS

NUMERICAL INFERENCE OF INTENSIVE AND EXTENSIVE VARIABLES

TMI INTRODUCES A RIEMANNIAN DISTANCE METRIC

LEARNING ISING MODEL FROM DATA

ANALYSIS OF HANDWRITTEN DIGITS

TMI PERFORMANCE IN DATA RECONSTRUCTION AND CLASSIFICATION

VIII. DISCUSSION

Bag of words data from NIPS conferences

Findings

Implementation of non-negative matrix factorization

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Physical Review Research	Publication Date: May 20, 2020
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Thermodynamic inference of data manifolds

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Physical Review Research

Lead the way for us

Similar Papers

The order parameter model of liquids and glasses with applications to dielectric relaxation
Arnold V Lesikar ... Cornelius T Moynihan
The Journal of Chemical Physics | VOL. 73
Arnold V Lesikar, et. al.Arnold V Lesikar ... Cornelius T Moynihan
15 Aug 1980
The Journal of Chemical Physics | VOL. 73

Approximating high dimensional probability distributions
...
-
, et. al. ...
23 Aug 2004
23 Aug 2004

Inducing Weinhold's metric from Euclidean and Riemannian metrics.
Bjarne Andresen ... R Stephen Berry
Physical review. A, General physics | VOL. 37
Bjarne Andresen, et. al.Bjarne Andresen ... R Stephen Berry
01 Feb 1988
Physical review. A, General physics | VOL. 37

Approximating high dimensional probability distributions
S Altmueller ... R.M Haralick
-
S Altmueller, et. al.S Altmueller ... R.M Haralick
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Thermodynamic inference of data manifolds

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Physical Review Research