Abstract

Abstract Probabilistic programming languages allow programmers to write down conditional probability distributions that represent statistical and machine learning models as programs that use observe statements. These programs are run by accumulating likelihood at each observe statement, and using the likelihood to steer random choices and weigh results with inference algorithms such as importance sampling or MCMC. We argue that naive likelihood accumulation does not give desirable semantics and leads to paradoxes when an observe statement is used to condition on a measure-zero event, particularly when the observe statement is executed conditionally on random data. We show that the paradoxes disappear if we explicitly model measure-zero events as a limit of positive measure events, and that we can execute these type of probabilistic programs by accumulating infinitesimal probabilities rather than probability densities. Our extension improves probabilistic programming languages as an executable notation for probability distributions by making it more well-behaved and more expressive, by allowing the programmer to be explicit about which limit is intended when conditioning on an event of measure zero.

Highlights

  • Probabilistic programming languages such as Stan [Carpenter et al 2017], Church [Goodman et al 2008], and Anglican [Wood et al 2014] allow programmers to express probabilistic models in statistics and machine learning in a structured way, and run these models with generic inference algorithms such as importance sampling, Metropolis-Hastings, SMC, HMC

  • The pragmatist says that probabilistic programs are a convenient way to write down a likelihood function, and the purist says that probabilistic programs are a notation for structured probabilistic models

  • We identify a problem with existing probabilistic programming languages, in which likelihood accumulation with probability densities can result in three different types of paradoxes when conditioning on a measure-zero event

Read more

Summary

INTRODUCTION

Probabilistic programming languages such as Stan [Carpenter et al 2017], Church [Goodman et al 2008], and Anglican [Wood et al 2014] allow programmers to express probabilistic models in statistics and machine learning in a structured way, and run these models with generic inference algorithms such as importance sampling, Metropolis-Hastings, SMC, HMC. We identify a problem with existing probabilistic programming languages, in which likelihood accumulation with probability densities can result in three different types of paradoxes when conditioning on a measure-zero event. We propose a change to probabilistic programming languages to avoid the paradoxes of the continuous measure-zero case, by changing the observe construct to condition on measurezero events E as an explicit limit ε → 0 of Eε (Sections 4 and 5), and – a method for computing the limit by accumulating infinitesimal probabilities instead of probability densities, which we use to implement the adjusted observe construct, – a theorem that shows that infinitesimal probabilities correctly compute the limit of Eε , ensuring that programs that use observe on measure-zero events are paradox free, – a translation from the existing observe construct to our new observe construct, which gives the same output if the original program was non-paradoxical, – language support for parameter transformations, which we use to show that the meaning of programs in our language is stable under parameter transformations, – an implementation of our language as an embedded DSL in Julia [Jacobs 2020] (Section 6)

ON THE EVENT THAT OBSERVE CONDITIONS ON
THREE TYPES OF PARADOXES
Paradox of Type 1
Paradox of Type 2
Paradox of Type 3
AVOIDING EVENTS OF MEASURE ZERO WITH INTERVALS
Conditioning on Measure Zero Events as a Limit of Positive Measure Events
USING INFINITESIMAL NUMBERS TO HANDLE MEASURE-ZERO OBSERVATIONS
Intervals of Infinitesimal Width Make Paradoxes Disappear
Importance Sampling with Infinitesimal Probabilities
The Correspondence Between Observe on Points and Observe on Intervals
Parameter Transformations as a Language Feature
IMPLEMENTATION IN JULIA
Findings
CONCLUSION & FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call