Abstract

The European-funded ESCAPE project will prototype a shared solution to computing challenges in the context of the European Open Science Cloud. It targets Astronomy and Particle Physics facilities and research infrastructures and focuses on developing solutions for handling Exabyte scale datasets. The DIOS work package aims at delivering a Data Infrastructure for Open Science. Such an infrastructure would be a non HEP specific implementation of the data lake concept elaborated in the HSF Community White Paper and endorsed in the WLCG Strategy Document for HL-LHC. The science projects in ESCAPE are in different phases of evolution. While HL-LHC can leverage 15 years of experience of distributed computing in WLCG, other sciences are building now their computing models. This contribution describes the architecture of a shared ecosystem of services fulfilling the needs in terms of data organisation, management and access for the ESCAPE community. The backbone of such a data lake will consist of several storage services operated by the partner institutes and connected through reliable networks. Data management and organisation will be orchestrated through Rucio. A layer of caching and latency hiding services, supporting various access protocols will serve the data to heterogeneous facilities, from conventional Grid sites to HPC centres and Cloud providers. The authentication and authorisation system will be based on tokens. For the success of the project, DIOS will integrate open source solutions which demonstrated reliability and scalability as at the multi petabyte scale. Such services will be configured, deployed and complemented to cover the use cases of the ESCAPE sciences which will be further developed during the project.

Highlights

  • The European Science Cluster of Astronomy and Particle physics ESFRI research infrastructure – ESCAPE [1] – is a European Union funded project in the context of Horizon 2020 [2]

  • We plan to build the datalake leveraging a heterogeneous set of storage solutions: dCache [4], DPM [5], EOS [6], StoRM [7] and xrootd [8] at the minimum. Such technologies have been deployed for many years in the Worldwide LHC Computing Grid (WLCG) [9] infrastructure and demonstrated their capability to operate at the hundred petabyte scale

  • We identified the xCache technology [20] as the most promising option fulfilling the requirements of the ESCAPE

Read more

Summary

Introduction

The European Science Cluster of Astronomy and Particle physics ESFRI research infrastructure – ESCAPE [1] – is a European Union funded project in the context of Horizon 2020 [2]. The cluster is formed by science projects with Exabyte-scale computing and storage needs in the 2020s and the main goal of ESCAPE is prototyping a digital infrastructure for those needs. ESCAPE should ensure that the sciences it represents drive the development and evolution of the European Open Science Cloud – EOSC [3]. The goal of the ESCAPE Work Package 2 (WP2 DIOS - Data Infrastructure for Open Science) is to build a cloud of data services, often referred as datalake. The datalake should serve as core infrastructure to support open data and enable the FAIR principles, by providing a flexible and scalable infrastructure to store and access scientific data, while optimizing the total cost of ownership

Datalake architecture
Datalake integration and deployment
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.