Abstract

In High Energy Physics facilities that provide High Performance Computing environments provide an opportunity to efficiently perform the statistical inference required for analysis of data from the Large Hadron Collider, but can pose problems with orchestration and efficient scheduling. The compute architectures at these facilities do not easily support the Python compute model, and the configuration scheduling of batch jobs for physics often requires expertise in multiple job scheduling services. The combination of the pure-Python libraries pyhf and funcX reduces the common problem in HEP analyses of performing statistical inference with binned models, that would traditionally take multiple hours and bespoke scheduling, to an on-demand (fitting) “function as a service” that can scalably execute across workers in just a few minutes, offering reduced time to insight and inference. We demonstrate execution of a scalable workflow using funcX to simultaneously fit 125 signal hypotheses from a published ATLAS search for new physics using pyhf with a wall time of under 3 minutes. We additionally show performance comparisons for other physics analyses with openly published probability models and argue for a blueprint of fitting as a service systems at HPC centers.

Highlights

  • Researchers in High Energy Physics (HEP) and other fields are encouraged by their funding bodies to take advantage of the High Performance Computing (HPC) facilities constructed at various institutions

  • For measurements in HEP based on binned data, the HistFactory [5] family of statistical models has been widely used for likelihood construction in Standard Model measurements (e.g. Refs. [6, 7]) as well as searches for new physics (e.g. Ref. [8]) and reinterpretation studies (e.g. Ref. [9]). pyhf is a pure-Python implementation of the HistFactory statistical model for multi-bin histogram-based analysis. pyhf’s interval estimation is computed through either the use of the asymptotic formulas of Ref. [10] or empirically through pseudoexperiments (“toys” in HEP parlance)

  • Without having to write any bespoke batch jobs, inference can be registered and executed by analysts with a client Python API that still achieves the large performance gains compared to single node execution that is a typical motivation of use of batch systems

Read more

Summary

Introduction

Researchers in High Energy Physics (HEP) and other fields are encouraged by their funding bodies to take advantage of the High Performance Computing (HPC) facilities constructed at various institutions. Through use of funcX [4], a pure-Python high performance function serving system designed to orchestrate scientific workloads across heterogeneous computing resources, pyhf can be used as a highly scalable (fitting) function as a service (FaaS) on HPCs. For measurements in HEP based on binned data (histograms), the HistFactory [5] family of statistical models has been widely used for likelihood construction in Standard Model measurements The funcX service will cause the task to wait and execute as many tasks as it can when the workers are available This helps to match the job profiles against a wide variety of compute environments. Environments that support containerization through Shifter or Singularity can specify a container in the setup This is easiest to administer; it requires that all tasks running on that endpoint only depend on these provided settings. The Kubernetes executor will launch worker pods with the requested container as needed to support task invocations

Current and Future FaaS Analysis Facilities
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.