Abstract

The challenges proposed by the HL-LHC era are not limited to the sheer amount of data to be processed: the capability of optimizing the analyser's experience will also bring important benefits for the LHC communities, in terms of total resource needs, user satisfaction and in the reduction of end time to publication. At the Italian National Institute for Nuclear Physics (INFN) a portable software stack for analysis has been proposed, based on cloud-native tools and capable of providing users with a fully integrated analysis environment for the CMS experiment. The main characterizing traits of the solution consist in the user-driven design and the portability to any cloud resource provider. All this is made possible via an evolution towards a “python-based” framework, that enables the usage of a set of open-source technologies largely adopted in both cloud-native and data-science environments. In addition, a “single sign on”-like experience is available thanks to the standards-based integration of INDIGO-IAM with all the tools. The integration of compute resources is done through the customization of a JupyterHUB solution, able to spawn identity-aware user instances ready to access data with no further setup actions. The integration with GPU resources is also available, designed to sustain more and more widespread ML based workflow. Seamless connections between the user UI and batch/big data processing framework (Spark, HTCondor) are possible. Eventually, the experiment data access latency is reduced thanks to the integrated deployment of a scalable set of caches, as developed in the context of ESCAPE project, and as such compatible with the future scenarios where a data-lake will be available for the research community. The outcome of the evaluation of such a solution in action is presented, showing how a real CMS analysis workflow can make use of the infrastructure to achieve its results.

Highlights

  • As the technologies and the challenges evolve, a new approach is needed when providing HL-LHC communities with all the tools needed to get their analysis work done

  • The end-user data analysis workflow is evolving under many aspects, with a new event data format called NanoAOD [1] designed by the CMS Collaboration [2] in order to satisfy the needs of a large fraction of physics analyses, with a per-event size of order of 1 kB; still, it contains all the top-level information typically used in the last steps of the analysis

  • Several initiatives are arising in this context such as those at CERN [4] and in US [5], and an effort is being made at the Italian National Institute for Nuclear Physics (INFN) in leveraging modern cloud-native paradigms to serve as building blocks for the analysis infrastructure, with the main objective of deploying a platform to be challenged and optimized in preparation of the HL-LHC era, a solution fully compatible with resources provisioning model and service portfolio composition strategy of the INFN-Cloud

Read more

Summary

Introduction

As the technologies and the challenges evolve, a new approach is needed when providing HL-LHC communities with all the tools needed to get their analysis work done. In terms of computing infrastructure this evolution brings the opportunity for R&D around new solutions that, on the one hand, offer the possibility to exploit models based e.g., python based WebUIs and, on the other hand, allow to optimize the throughput, a key optimization aspect for the analysis at CMS This translates into a model which foresees the usage of a well-equipped node with specialized hardware such as NVMe, many CPU and GPU cores enabling analysis activities at the MHz level and beyond. The primary motivations of the proposed architecture are to satisfy the shift toward interactivity as opposed to the GRID batch approach, as well as to maximize the throughput while analysing the experiment data This can be done by using a single node with specialized hardware, it is important to exploit scale out capability as well and, possibly, embedding everything in the very same deployment. This approach does not preclude the use of Singularity at the application level

Architecture overview
Identity and access management
Caching
On-demand computation and scale out
Autoscaling on custom metrics
Portability of the system and deployment strategy
First user experiences
Current experiences and lessons learnt
Conclusion and plans
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.