Abstract

The question of how best to estimate a continuous probability density from finite data is an intriguing open problem at the interface of statistics and physics. Previous work has argued that this problem can be addressed in a natural way using methods from statistical field theory. Here I describe results that allow this field-theoretic approach to be rapidly and deterministically computed in low dimensions, making it practical for use in day-to-day data analysis. Importantly, this approach does not impose a privileged length scale for smoothness of the inferred probability density, but rather learns a natural length scale from the data due to the tradeoff between goodness of fit and an Occam factor. Open source software implementing this method in one and two dimensions is provided.

Highlights

  • Suppose we are given N data points, x1,x2, . . . ,xN, each of which is a D-dimensional vector drawn from a smooth probability density Qtrue(x)

  • A prior p(Q| ) that strongly penalizes fluctuations in Q below this length scale is formulated in terms of a scalar field theory

  • Bialek et al argued on theoretical grounds that the data themselves will typically select a natural value for this smoothness length scale due to the competing influences of goodness of fit and an Occam factor [11]

Read more

Summary

Introduction

Suppose we are given N data points, x1,x2, . . . ,xN , each of which is a D-dimensional vector drawn from a smooth probability density Qtrue(x). One would compute a Bayesian posterior p(Q|data) identifying which densities are most consistent with both the data and the prior. The maximum a posteriori (MAP) density Q , which maximizes p(Q| ,data) and serves as an estimate of Qtrue, is computed as the solution to a nonlinear differential equation.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call