Abstract

Beijing Spectrometer (BESIII) experiment has produced hundreds of billions of events. The traditional event-wise accessing of BESIII Offline Software System is not effective for the selective accessing with low rate during a physics analysis. In this paper, an event-based data management system (EventDB) is introduced, which can effectively alleviate the problems of low efficiency of data processing and low utilization of resources. Firstly, an indexing system based on NoSQL database is designed. By extracting specified attributes of events, the events interested to the physicists are selected and stored into the database, whilst the real data of event is still stored in ROOT files. For those hot events, the real event data can also be cached into EventDB to improve the access performance. The data analysis workflow of HEP experiments is needed to change if the EventDB system is applied. The analysis program queries the corresponding event index from database, then get event data from database if the event is cached, or get data from ROOT files if it is not cached. Finally, the test on more than one hundred billion physics events shows the query speed was greatly improved over traditional file-based data management systems.

Highlights

  • As the scale of high-energy physics (HEP) experiments continues to expand, more and more data is produced

  • The EventDB system described in this paper focuses on the event index and pre-selection, and does not change the BESIII storage and analysis model

  • We have setup a test bed composed of two sites including Beijing and Chengdu to evaluate the performance of the EventDB system

Read more

Summary

Introduction

As the scale of high-energy physics (HEP) experiments continues to expand, more and more data is produced. Most of high-energy physics experiment data are managed in the granularity of file, and each file contains several events. File-based data management are facing a lot of challenges with the rapid growth of experiment data and the emergence of new technologies. 2) If one site does not have sufficient storage space and enough network bandwidth, it is difficult to run data analysis tasks which need a large of amount of input data. In this case, it required that only a subset of data are transferred on demand.

Related works
The design and implementation
Event-oriented data transfer service
Test bed and results
Findings
Summary
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call