Abstract

Information theory provides a mathematical foundation to measure uncertainty in belief. Belief is represented by a probability distribution that captures our understanding of an outcome’s plausibility. Information measures based on Shannon’s concept of entropy include realization information, Kullback–Leibler divergence, Lindley’s information in experiment, cross entropy, and mutual information. We derive a general theory of information from first principles that accounts for evolving belief and recovers all of these measures. Rather than simply gauging uncertainty, information is understood in this theory to measure change in belief. We may then regard entropy as the information we expect to gain upon realization of a discrete latent random variable. This theory of information is compatible with the Bayesian paradigm in which rational belief is updated as evidence becomes available. Furthermore, this theory admits novel measures of information with well-defined properties, which we explored in both analysis and experiment. This view of information illuminates the study of machine learning by allowing us to quantify information captured by a predictive model and distinguish it from residual information contained in training data. We gain related insights regarding feature selection, anomaly detection, and novel Bayesian approaches.

Highlights

  • This work integrates essential properties of information embedded within Shannon’s derivation of entropy [1] and the Bayesian perspective [2,3,4], which identifies probability with plausibility.We pursued this investigation in order to understand how to rigorously apply information-theoretic concepts to the theory of inference and machine learning

  • By axiomatizing the properties of information we desire, we show that a unique formulation follows that subsumes critical properties of Shannon’s construction of entropy

  • By formulating principles that articulate how we may regard information as a reasonable expectation that measures change in belief, we derived a theory of information that places existing measures of entropic information in a coherent unified framework

Read more

Summary

Introduction

This work integrates essential properties of information embedded within Shannon’s derivation of entropy [1] and the Bayesian perspective [2,3,4], which identifies probability with plausibility. We pursued this investigation in order to understand how to rigorously apply information-theoretic concepts to the theory of inference and machine learning. We wanted to understand how to quantify the evolution of predictions given by machine learning models.

Shortcomings with Standard Approaches
Our Contributions
Background and Notation
Bayesian Reasoning
Probability Notation
Reasonable Expectation and Rational Belief
Remarks on Bayesian Objectivism and Subjectivism
Shannon’s Properties of Entropy
Postulates
Principal Result
Regarding the Support of Expectation
Information Density
Information Pseudometrics
Corollaries and Interpretations
Entropy
Information in an Observation
Potential Information
Consistent Optimization Analysis
Discrepancy Functions
Jaynes Maximal Uncertainty
Remarks on Fisher Information
Information in Inference and Machine Learning
Machine Learning Information
Inference Information Bounds
Inference Information Constraints
Explicit Information Constraints
Implicit Information Constraints
Negative Information
Negative Information in Continuous Inference
Negative Information in MNIST Model with Mislabeled Data
Conclusions
Findings
Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.