Distributed statistical inference with pyhf enabled through funcX

Matthew Feickert,Ben Galewsky,Giordon Stark,Lukas Heinrich

doi:10.1051/epjconf/202125102070

Abstract

In High Energy Physics facilities that provide High Performance Computing environments provide an opportunity to efficiently perform the statistical inference required for analysis of data from the Large Hadron Collider, but can pose problems with orchestration and efficient scheduling. The compute architectures at these facilities do not easily support the Python compute model, and the configuration scheduling of batch jobs for physics often requires expertise in multiple job scheduling services. The combination of the pure-Python libraries pyhf and funcX reduces the common problem in HEP analyses of performing statistical inference with binned models, that would traditionally take multiple hours and bespoke scheduling, to an on-demand (fitting) “function as a service” that can scalably execute across workers in just a few minutes, offering reduced time to insight and inference. We demonstrate execution of a scalable workflow using funcX to simultaneously fit 125 signal hypotheses from a published ATLAS search for new physics using pyhf with a wall time of under 3 minutes. We additionally show performance comparisons for other physics analyses with openly published probability models and argue for a blueprint of fitting as a service systems at HPC centers.

Highlights

Researchers in High Energy Physics (HEP) and other fields are encouraged by their funding bodies to take advantage of the High Performance Computing (HPC) facilities constructed at various institutions
For measurements in HEP based on binned data, the HistFactory [5] family of statistical models has been widely used for likelihood construction in Standard Model measurements (e.g. Refs. [6, 7]) as well as searches for new physics (e.g. Ref. [8]) and reinterpretation studies (e.g. Ref. [9]). pyhf is a pure-Python implementation of the HistFactory statistical model for multi-bin histogram-based analysis. pyhf’s interval estimation is computed through either the use of the asymptotic formulas of Ref. [10] or empirically through pseudoexperiments (“toys” in HEP parlance)
Without having to write any bespoke batch jobs, inference can be registered and executed by analysts with a client Python API that still achieves the large performance gains compared to single node execution that is a typical motivation of use of batch systems

Summary

Introduction

Researchers in High Energy Physics (HEP) and other fields are encouraged by their funding bodies to take advantage of the High Performance Computing (HPC) facilities constructed at various institutions. Through use of funcX [4], a pure-Python high performance function serving system designed to orchestrate scientific workloads across heterogeneous computing resources, pyhf can be used as a highly scalable (fitting) function as a service (FaaS) on HPCs. For measurements in HEP based on binned data (histograms), the HistFactory [5] family of statistical models has been widely used for likelihood construction in Standard Model measurements The funcX service will cause the task to wait and execute as many tasks as it can when the workers are available This helps to match the job profiles against a wide variety of compute environments. Environments that support containerization through Shifter or Singularity can specify a container in the setup This is easiest to administer; it requires that all tasks running on that endpoint only depend on these provided settings. The Kubernetes executor will launch worker pods with the requested container as needed to support task invocations

Current and Future FaaS Analysis Facilities

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Distributed statistical inference with pyhf enabled through funcX

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Similar Papers

Searches for New Physics at the Large Hadron Collider
Jeffrey D Richman
-
Jeffrey D RichmanJeffrey D Richman
07 Aug 2014
07 Aug 2014

Searches for new physics at CMS and ATLAS without Leptons or Resonances
Daniel Duggan ... Amitabh Lath
EPJ Web of Conferences | VOL. 49
Daniel Duggan, et. al.Daniel Duggan ... Amitabh Lath
01 Jan 2013
EPJ Web of Conferences | VOL. 49

HERA and the LHC - A workshop on the implications of HERA for LHC physics: Proceedings Part A
...
-
, et. al. ...
02 Jan 2006
02 Jan 2006

CMS Technical Design Report, Volume II: Physics Performance
Albert De Roeck
Journal of Physics G: Nuclear and Particle Physics | VOL. 34
Albert De RoeckAlbert De Roeck
20 Apr 2007
Journal of Physics G: Nuclear and Particle Physics | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed statistical inference with pyhf enabled through funcX

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences