Abstract

Contemporary scientific experiments produce significant amount of data as well as scientific publications based on this data. Since volumes of both are constantly increasing, it becomes more and more problematic to establish a connection between a given paper and the underlying data. However, such an association is one of the crucial pieces of information for performing various tasks, such as validating the scientific results presented in paper, comparing different approaches to deal with a problem or even simply understanding the situation in some area of science. Authors of this paper are working under the Data Knowledge Base (DKB) R&D project, initiated in 2016 to solve this issue for the ATLAS experiment at CERN. This project is aimed at developing of the software environment, providing the storage and a coherent representation of the basic information objects. In this paper authors present a metadata model developed for the ATLAS experiment, the architecture of the DKB system and its main components. Special attention is paid to the Kafka-based ETL subsystem implementation and mechanism for extraction of meta-information from the texts of ATLAS publications

Highlights

  • The life cycle of a scientific experiment has changed considerably during the last decades

  • Authors of this paper are working under the Data Knowledge Base (DKB) R&D project, initiated in 2016 to solve this issue for the ATLAS experiment at CERN

  • Data and metadata related to the experiment must be stored and available for reprocessing for years or even decades, as scientists may need to consult it in order to plan further research, compare results to the ones achieved with different approaches, initiate reprocessing of some experimental data to maintain reproducibility

Read more

Summary

Introduction

The life cycle of a scientific experiment has changed considerably during the last decades. Data and metadata related to the experiment must be stored and available for reprocessing for years or even decades, as scientists may need to consult it in order to plan further research, compare results to the ones achieved with different approaches, initiate reprocessing of some experimental data to maintain reproducibility. The Data Knowledge Base (DKB) project [1], dealing with the metadata of high energy physics (HEP) experiments (with ATLAS as an example), is one of the attempts to address this problem. DKB project was started in 2016 as a joint effort by ATLAS experiment at CERN, Kurchatov Institute and Tomsk Polytechnic University. The initial idea of the project was to provide a fast and user-friendly access to relevant scientific information regarding the physical analysis -- data used in the research, hardware/software configuration, publications and so on

Objectives
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.