Abstract

Constraining the parameters of physical models with >5-10 parameters is a widespread problem in fields like particle physics and astronomy. The generation of data to explore this parameter space often requires large amounts of computational resources. The commonly used solution of reducing the number of relevant physical parameters hampers the generality of the results. In this paper we show that this problem can be alleviated by the use of active learning. We illustrate this with examples from high energy physics, a field where simulations are often expensive and parameter spaces are high-dimensional. We show that the active learning techniques query-by-committee and query-by-dropout-committee allow for the identification of model points in interesting regions of high-dimensional parameter spaces (e.g. around decision boundaries). This makes it possible to constrain model parameters more efficiently than is currently done with the most common sampling algorithms and to train better performing machine learning models on the same amount of data. Code implementing the experiments in this paper can be found on GitHub

Highlights

  • IntroductionIn this paper we approach this problem by exploring the use of active learning [9,24,25], an iterative method that applies machine learning to guide the sampling of new model points to specific regions of the parameter space

  • (e.g., finding which input parameters of a universe simulation yield a universe that looks like ours) is still a challenging problem

  • In this paper we approach this problem by exploring the use of active learning [9,24,25], an iterative method that applies machine learning to guide the sampling of new model points to specific regions of the parameter space

Read more

Summary

Introduction

In this paper we approach this problem by exploring the use of active learning [9,24,25], an iterative method that applies machine learning to guide the sampling of new model points to specific regions of the parameter space. Active learning reduces the time needed to run expensive simulations by evaluating points that are expected to lie in regions of interest. As this is done iteratively, this method increases the resolution of the true boundary with each iteration. For classification problems this results in the sampling of points around – and thereby a better resolution on – decision boundaries.

Active learning
Random forest with a finite pool
Increase resolution of exclusion boundary
Random forest with an infinite pool
QBDC with an infinite pool
QBC with infinite pool for smaller parameter spaces
Identifying uncertain regions and steering new searches
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call