Abstract

As a well established, large-scale distributed storage system, dCache is used to manage and serve huge amounts of data collected by high energy physics, astrophysics and photon science experiments. Based on a microservices-like architecture, dCache is built as a modular distributed system, where each component provides a different core functionality. These services communicate by passing serialized messages to each other, a core behavior whose performance properties can consequently affect the entire system. This paper compares and evaluates different data serialization protocols in computer science with the objective of replacing and improving upon Java Object Serialization (JOS), which has increasingly presented itself as no longer being sufficiently performant for encoding messages. The criteria for choosing a new framework are collected, analyzed and formalized. The primary motivation for replacing Java serialization for encoding dCache messages is increasing the general speed of message-passing and thereby reducing the round-trip time for user requests. Emphasis is also placed on schema evolution capabilities and framework usability. Approaches for generalizing (de)serialization speed and size measurements based on data structure complexity are introduced, criteria for measuring documentation, learning curve, maintainability and introduction effort are defined. Finally, several selected serialization protocols are evaluated and compared accordingly, concluding with a recommendation for a suitable JOS replacement.

Highlights

  • The dCache software [1] is an open-source distributed storage system written in Java, which uses a microservices-like architecture to provide location-independent access to data

  • The primary motivation for replacing Java serialization is increasing the general speed of message-passing and thereby reducing the round-trip time for user requests

  • Within dCache, the Java object serialization is used to serialize these messages to a binary format

Read more

Summary

Introduction

The dCache software [1] is an open-source distributed storage system written in Java, which uses a microservices-like architecture to provide location-independent access to data. It is designed to support a wide range of use cases, from high-throughput data ingest, being dynamically scalable to hundreds of petabytes, as well as deployable in heterogeneous systems and on commodity hardware. It is easy to integrate with other systems, because it can communicate over several protocols for accessing data and enabling authentication, and supports. A significant portion of the time needed for internal communication between services is spent on serializing and deserializing messages. The primary motivation for replacing Java serialization is increasing the general speed of message-passing and thereby reducing the round-trip time for user requests. This paper is the summary of a larger scientific thesis [3]

Related Work
Current Message Serialization in dCache
Criteria for a New Serialization Protocol in dCache
Serialization Protocols to be Evaluated
Evaluation Scenarios
Environment and Tools
Results
Summary and Outlook
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.