Categorical Stochastic Processes and Likelihood

Dan Shiebler

doi:10.32408/compositionality-3-1

Abstract

We take a category-theoretic perspective on the relationship between probabilistic modeling and gradient based optimization. We define two extensions of function composition to stochastic process subordination: one based on a co-Kleisli category and one based on the parameterization of a category with a Lawvere theory. We show how these extensions relate to the category of Markov kernels Stoch through a pushforward procedure.We extend stochastic processes to parametric statistical models and define a way to compose the likelihood functions of these models. We demonstrate how the maximum likelihood estimation procedure defines a family of identity-on-objects functors from categories of statistical models to the category of supervised learning algorithms Learn.Code to accompany this paper can be found on GitHub (https://github.com/dshieble/Categorical_Stochastic_Processes_and_Likelihood).

Highlights

The explosive success of machine learning over the last two decades has inspired theoretical work aimed at developing rigorous frameworks for reasoning about and extending machine learning algorithms
Consider a physical system which has several components, each of which has some degree of aleatoric uncertainty
Consider once again a physical system that is composed of several components, each of which has some degree of aleatoric uncertainty

Summary

Introduction

The explosive success of machine learning over the last two decades has inspired theoretical work aimed at developing rigorous frameworks for reasoning about and extending machine learning algorithms. Many biological processes will produce slightly different results based on randomness in turbulent fluid flows For this reason, models that approximate physical systems often implicitly or explicitly produce a probability distribution over the possible outputs conditioned on some input [25]. Kullback-Leibler (KL) divergence (which measures how one probability distribution is different from a second, reference probability distribution) between a distribution with expected value given by the model’s output and P (y|X) In this way the structure of the model’s aleatoric uncertainty is captured in its loss function (mean square error in this case). In this paper we describe an alternative strategy for constructing and composing parametric models such that we can explicitly characterize how different subsystems’ uncertainties interact We use this strategy to build a generalized framework for training neural networks that have stochastic processes as layers. Define a family of subcategories of parametric statistical models over which we can use the maximum likelihood procedure to define a backpropagation functor into the category Learn of learning algorithms [12]

Preliminaries