Abstract
As a well established, large-scale distributed storage system, dCache is used to manage and serve huge amounts of data collected by high energy physics, astrophysics and photon science experiments. Based on a microservices-like architecture, dCache is built as a modular distributed system, where each component provides a different core functionality. These services communicate by passing serialized messages to each other, a core behavior whose performance properties can consequently affect the entire system. This paper compares and evaluates different data serialization protocols in computer science with the objective of replacing and improving upon Java Object Serialization (JOS), which has increasingly presented itself as no longer being sufficiently performant for encoding messages. The criteria for choosing a new framework are collected, analyzed and formalized. The primary motivation for replacing Java serialization for encoding dCache messages is increasing the general speed of message-passing and thereby reducing the round-trip time for user requests. Emphasis is also placed on schema evolution capabilities and framework usability. Approaches for generalizing (de)serialization speed and size measurements based on data structure complexity are introduced, criteria for measuring documentation, learning curve, maintainability and introduction effort are defined. Finally, several selected serialization protocols are evaluated and compared accordingly, concluding with a recommendation for a suitable JOS replacement.
Highlights
The dCache software [1] is an open-source distributed storage system written in Java, which uses a microservices-like architecture to provide location-independent access to data
The primary motivation for replacing Java serialization is increasing the general speed of message-passing and thereby reducing the round-trip time for user requests
Within dCache, the Java object serialization is used to serialize these messages to a binary format
Summary
The dCache software [1] is an open-source distributed storage system written in Java, which uses a microservices-like architecture to provide location-independent access to data. It is designed to support a wide range of use cases, from high-throughput data ingest, being dynamically scalable to hundreds of petabytes, as well as deployable in heterogeneous systems and on commodity hardware. It is easy to integrate with other systems, because it can communicate over several protocols for accessing data and enabling authentication, and supports. A significant portion of the time needed for internal communication between services is spent on serializing and deserializing messages. The primary motivation for replacing Java serialization is increasing the general speed of message-passing and thereby reducing the round-trip time for user requests. This paper is the summary of a larger scientific thesis [3]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.