The FAST-HEP toolset: Using YAML to make tables out of trees

Benjamin Edward Krikler,Jacob Linacre,Olivier Davignon,Lukasz Kreczko

doi:10.1051/epjconf/202024506016

Abstract

The Faster Analysis Software Taskforce (FAST) is a small, European group of HEP researchers that have been investigating and developing modern software approaches to improve HEP analyses. We present here an overview of the key product of this effort: a set of packages that allows a complete implementation of an analysis using almost exclusively YAML files. Serving as an analysis description language (ADL), this toolset builds on top of the evolving technologies from the Scikit-HEP and IRIS-HEP projects as well as industry-standard libraries such as Pandas and Matplotlib. Data processing starts with event-level data (the trees) and can proceed by adding variables, selecting events, performing complex user-defined operations and binning data, as defined in the YAML description. The resulting outputs (the tables) are stored as Pandas dataframes which can be programmatically manipulated and converted to plots or inputs for fitting frameworks. No longer just a proof-of-principle, these tools are now being used in CMS analyses, the LUX-ZEPLIN experiment, and by students on several other experiments. In this talk we will showcase these tools through examples, highlighting how they address the different experiments’ needs, and compare them to other similar approaches.

Highlights

Producing high-quality research papers in High-Energy Physics (HEP) involves processing petabytes of data, applying the latest knowledge for the specific experiment and the statistical evaluation of the end-results and their uncertainties. This process often involves the use of experiment specific software frameworks, community packages as well as researcher-written code
The key aims of the Faster Analysis Software Taskforce (FAST) are to: a) reduce the amount of researcher-written code to minimize mistakes, b) lower the entry requirements for new researchers, c) make it easier to share, and d) provide an abstraction between the analysis itself and the processing system that runs over the data
The mechanism used to load stages extensible; if fast-carpenter does not provide a stage that you need for your analysis, it is easy to write it and include it in your workflow

Summary

Introduction

Producing high-quality research papers in High-Energy Physics (HEP) involves processing petabytes of data, applying the latest knowledge for the specific experiment and the statistical evaluation of the end-results and their uncertainties. The key aims of the Faster Analysis Software Taskforce (FAST) are to: a) reduce the amount of researcher-written code to minimize mistakes, b) lower the entry requirements for new researchers, c) make it easier to share, and d) provide an abstraction between the analysis itself and the processing system that runs over the data. New datasets can become available as a physics run continues, simulations are extended, or existing data is reprocessed through early stages. Input data are described as one or more data sets in YAML files that are generated and interpreted using the fast-curator package.

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The FAST-HEP toolset: Using YAML to make tables out of trees

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2020
License type: CC BY 4.0

Similar Papers

CMS Analysis and Data Reduction with Apache Spark
Oliver Gutsche ... Viktor Khristenko
Journal of Physics: Conference Series | VOL. 1085
Oliver Gutsche, et. al.Oliver Gutsche ... Viktor Khristenko
01 Sep 2018
Journal of Physics: Conference Series | VOL. 1085

Clinical Presentation and Long-Term Outcomes of Systemic Sclerosis Portuguese Patients from a Single Centre Cohort: A EUSTAR Registration Initiative.
Carolina Vidal ... Vera Bernardino
Acta Médica Portuguesa | VOL. 31
Carolina Vidal, et. al.Carolina Vidal ... Vera Bernardino
29 Jun 2018
Acta Médica Portuguesa | VOL. 31

News
Kenji Sakurai
X-Ray Spectrometry | VOL. 40
Kenji SakuraiKenji Sakurai
01 Mar 2011
X-Ray Spectrometry | VOL. 40

Aviation English - A global perspective: analysis, teaching, assessment
Marcia Alessandra Arantes Marques
-
Marcia Alessandra Arantes MarquesMarcia Alessandra Arantes Marques
21 Oct 2022
21 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The FAST-HEP toolset: Using YAML to make tables out of trees

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences