Abstract

The CMS experiment at CERN has released research-quality data from particle collisions at the LHC since 2014. Almost all data from the first LHC run in 2010–2012 with the corresponding simulated samples are now in the public domain, and several scientific studies have been performed using these data. This paper summarizes the available data and tools, reviews the challenges in using them in research, and discusses measures to improve their usability.

Highlights

  • The first data release of the CMS experiment was announced in 2014, bringing researchquality particle collision data into the public domain for the first time

  • As of 2021, more than 2 PB of data from the 2010–2012 run period are available to users external to the CMS collaboration, served through the CERN Open data portal (CODP) [2]

  • List of validated data For selecting properly validated events, each collision dataset has a corresponding list of validated runs, to be applied to each analysis to filter out data not having passed the data qualification criteria. This list is in JavaScript Object Notation (JSON) format, and tools are available in CMSSW to apply it as a filter in the analysis phase

Read more

Summary

Introduction

The first data release of the CMS experiment was announced in 2014, bringing researchquality particle collision data into the public domain for the first time. Further studies have been performed including searches for new particles [5, 6], Standard Model analyses [7], and several studies on machine learning and methodology. The usability of these data and in the long-term future is the key factor when assessing the success, often requiring a delicate balance between constraints inherent to old data, the state-of-art tools at the moment, and the long-term vision. Feedback from CMS open data users and current limitations and challenges are discussed in Sec. 4, and Sec. 5 addresses the measures taken or foreseen to further improve the usability of CMS open data

CMS open data
Data products
Software and associated information
Using CMS open data
Virtual machine images
Container images
Data access
Condition data
User feedback and challenges
User feedback
Data complexity
Software complexity
Scalability
Long-term usability
Measures to improve usability
Documentation and training
Preserved workflows as examples
Data analysis in cloud environments
Findings
Outlook
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.