Abstract

We present a theoretical framework of probabilistic learning derived from the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">maximum probability (MP) theorem</i> shown in this article. In this probabilistic framework, a model is defined as an <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">event</i> in the probability space, and a model or the associated <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">event</i> —either the true underlying model or the parameterized model—has a quantified probability measure. This quantification of a model's probability measure is derived by the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MP theorem</i> , in which we have shown that an event's probability measure has an upper bound given its conditional distribution on an arbitrary random variable. Through this alternative framework, the notion of model parameters is encompassed in the definition of the model or the associated <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">event</i> . Therefore, this framework deviates from the conventional approach of assuming a prior on the model parameters. Instead, the regularizing effects of assuming prior over parameters are imposed through maximizing probabilities of models or, according to information theory, minimizing the information content of a model. The probability of a model in our framework is invariant to reparameterization and is solely dependent on the model's likelihood function. In addition, rather than maximizing the posterior in a conventional Bayesian setting, the objective function in our alternative framework is defined as the probability of set operations (e.g., intersection) on the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">event</i> of the true underlying model and the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">event</i> of the model at hand. Our theoretical framework adds clarity to probabilistic learning through solidifying the definition of probabilistic models, quantifying their probabilities, and providing a visual understanding of objective functions. <p xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><i>Impact Statement</i>—The choice of prior distribution over the parameters of probabilistic machine learning models determines the regularization of learning algorithms in the Bayesian perspective. The complexity in choice of prior over the parameters and the form of regularization is relative to the complexity of the models being used. Thereby, finding priors for parameters of complex models is often not tractable. We address this problem by uncovering the maximum probability (MP) theorem as a direct consequence of Kolmogorov's probability theory. Through the lens of the MP theorem, the process of regularizing models is understood and automated. The regularization process is defined as the maximization of the probability of the model. The probability of the model is understood by the MP theorem and is determined by the behavior of the model. The effects of maximizing the probability of the model can be backpropagated in a gradient-based optimization process. Consequently, the MP framework provides a form of black-box regularization and eliminates the need for case-by-case analysis of models to determine priors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.