Abstract

For many scientific projects, data management is an increasingly complicated challenge. The number of data-intensive instruments generating unprecedented volumes of data is growing and their accompanying workflows are becoming more complex. Their storage and computing resources are heterogeneous and are distributed at numerous geographical locations belonging to different administrative domains and organisations. These locations do not necessarily coincide with the places where data is produced nor where data is stored, analysed by researchers, or archived for safe long-term storage. To fulfil these needs, the data management system Rucio has been developed to allow the high-energy physics experiment ATLAS at LHC to manage its large volumes of data in an efficient and scalable way. But ATLAS is not alone, and several diverse scientific projects have started evaluating, adopting, and adapting the Rucio system for their own needs. As the Rucio community has grown, many improvements have been introduced, customisations have been added, and many bugs have been fixed. Additionally, new dataflows have been investigated and operational experiences have been documented. In this article we collect and compare the common successes, pitfalls, and oddities that arose in the evaluation efforts of multiple diverse experiments, and compare them with the ATLAS experience. This includes the high-energy physics experiments Belle II and CMS, the neutrino experiment DUNE, the scattering radar experiment EISCAT3D, the gravitational wave observatories LIGO and VIRGO, the SKA radio telescope, and the dark matter search experiment XENON.

Highlights

  • The original motivation for a common data management solution comes from the anticipation of the limited amount of data resources available in the mid-term future

  • Ensuring more efficient use of available data resources across multiple experiments has become a strategic goal for many communities, where they can allocate storage and network based on science needs and not based on administrative domains

  • Rucio can be connected with different Workflow Management Systems (WMS) and already supports PanDA[6], the ATLAS WMS

Read more

Summary

Belle II

The Belle II experiment[8] is a particle physics experiment designed to study the properties of B mesons. In the first stage the current data management APIs are extended with an implementation that uses Rucio under the hood This is mostly transparent to the rest of Belle II and allows both data management backends to work in parallel during the transition phase. This still relies on a legacy file catalogue, and does not take full advantage of Rucio and its functionalities, being limited to the currently used APIs by definition. This stage allows the BNL team to gain experience in a production environment of using the DIRAC WMS with Rucio. The DUNE community has expressed interest in contributing to these developments in the near future

EISCAT3D
Square Kilometre Array
10 Lessons learnt
11 Summary and conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.