Abstract

The Gibbs-Boltzmann distribution offers a physically interpretable way to massively reduce the dimensionality of high dimensional probability distributions where the extensive variables are `features' and the intensive variables are `descriptors'. However, not all probability distributions can be modeled using the Gibbs-Boltzmann form. Here, we present TMI: TMI, {\bf T}hermodynamic {\bf M}anifold {\bf I}nference; a thermodynamic approach to approximate a collection of arbitrary distributions. TMI simultaneously learns from data intensive and extensive variables and achieves dimensionality reduction through a multiplicative, positive valued, and interpretable decomposition of the data. Importantly, the reduced dimensional space of intensive parameters is not homogeneous. The Gibbs-Boltzmann distribution defines an analytically tractable Riemannian metric on the space of intensive variables allowing us to calculate geodesics and volume elements. We discuss the applications of TMI with multiple real and artificial data sets. Possible extensions are discussed as well.

Highlights

  • Over the past few years, our ability to collect high dimensional data has improved substantially

  • Differentiating with respect to the intensive and the extensive variables and setting the derivative to zero, we find that the intensive variables are fixed points of nonlinear equations qa(α)Yka =

  • We can ensure that the extensive variables Yka and Ykb corresponding to neighboring states a and b are similar to each other by introducing regularizing constraints: nab(Yka − Ykb)2 < Ck ∀ k, (9)

Read more

Summary

INTRODUCTION

Over the past few years, our ability to collect high dimensional data has improved substantially. The high dimensional data (in the form of a matrix) are expressed as a product of two or more simpler (for example, sparse or low rank) matrices In contrast methods such as diffusion maps [3], Laplacian Eigenmaps [4], Isomaps [5], tSNE (t-stochastic neighborhood embedding) [6], and UMAP (uniform manifold approximation and projection) [7] are based on manifold learning. Motivating the Gibbs-Boltzmann distribution using the maximum entropy principle [8,9] has allowed us to employ it to model probabilities in a variety of complex systems such as ensembles of protein sequences [10], parameters of signaling networks [11,12], collective firing of neurons [13], and collective motions of birds [14] This approach approach has been used to approximate dynamics of chemical reaction networks [15,16].

TMI APPROXIMATES ARBITRARY DISTRIBUTIONS
NUMERICAL INFERENCE OF INTENSIVE AND EXTENSIVE VARIABLES
TMI INTRODUCES A RIEMANNIAN DISTANCE METRIC
LEARNING ISING MODEL FROM DATA
ANALYSIS OF HANDWRITTEN DIGITS
TMI PERFORMANCE IN DATA RECONSTRUCTION AND CLASSIFICATION
VIII. DISCUSSION
Bag of words data from NIPS conferences
Findings
Implementation of non-negative matrix factorization

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.