Abstract

We take a category-theoretic perspective on the relationship between probabilistic modeling and gradient based optimization. We define two extensions of function composition to stochastic process subordination: one based on a co-Kleisli category and one based on the parameterization of a category with a Lawvere theory. We show how these extensions relate to the category of Markov kernels Stoch through a pushforward procedure.We extend stochastic processes to parametric statistical models and define a way to compose the likelihood functions of these models. We demonstrate how the maximum likelihood estimation procedure defines a family of identity-on-objects functors from categories of statistical models to the category of supervised learning algorithms Learn.Code to accompany this paper can be found on GitHub (https://github.com/dshieble/Categorical_Stochastic_Processes_and_Likelihood).

Highlights

  • The explosive success of machine learning over the last two decades has inspired theoretical work aimed at developing rigorous frameworks for reasoning about and extending machine learning algorithms

  • Consider a physical system which has several components, each of which has some degree of aleatoric uncertainty

  • Consider once again a physical system that is composed of several components, each of which has some degree of aleatoric uncertainty

Read more

Summary

Introduction

The explosive success of machine learning over the last two decades has inspired theoretical work aimed at developing rigorous frameworks for reasoning about and extending machine learning algorithms. Many biological processes will produce slightly different results based on randomness in turbulent fluid flows For this reason, models that approximate physical systems often implicitly or explicitly produce a probability distribution over the possible outputs conditioned on some input [25]. Kullback-Leibler (KL) divergence (which measures how one probability distribution is different from a second, reference probability distribution) between a distribution with expected value given by the model’s output and P (y|X) In this way the structure of the model’s aleatoric uncertainty is captured in its loss function (mean square error in this case). In this paper we describe an alternative strategy for constructing and composing parametric models such that we can explicitly characterize how different subsystems’ uncertainties interact We use this strategy to build a generalized framework for training neural networks that have stochastic processes as layers. Define a family of subcategories of parametric statistical models over which we can use the maximum likelihood procedure to define a backpropagation functor into the category Learn of learning algorithms [12]

Preliminaries
Categories
Random Variables and Independence in BorelStoch
The co-Kleisli Construction
Independence and Dependence in CEuc
The Parameterization Construction
An extension of Para
Lawvere Parameterization
Applying Para to Euc
Parameterized Statistical Models
The Category DF
A subcategory of Gaussian-preserving transformations
Relationship to Gauss
Expectation Composition
Likelihood and Learning
Conditional Likelihood
Maximum Likelihood
Learning from Likelihoods
Backpropagation Functors
Discussion and Future
Proof of Proposition 5
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call