Abstract

In CMS, data access and management is organized around the data tier model: a static definition of what subset of event information is available in a particular dataset, realized as a collection of files. We present a novel data management model that obviates the need for data tiers by exploding files into individual event data product objects. The objects are stored and retrieved through Ceph S3 technology, with a layout designed to minimize data and metadata volume while maximizing data processing parallelism. We demonstrate that this object data format shows promise in reducing total storage requirements while allowing more flexible data access patterns. Performance benchmarks of a prototype data processing framework using this object data format and a test Ceph cluster are presented, showing good scaling behavior in a distributed processing task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.