Distributed storage systems store some redundant data to keep the degree of availability of the stored data constant and also to increase the system's resistance against failures. This type of systems usually use pure replication or methods based on RAID systems as redundancy schemes. In this paper, we study the communication cost of a distributed data storage system using Maximum Distance Separable (MDS) erasure codes. Our focus is reduction of the cost of one-to-many communication used in data reconstruction/repair initialization and update operations. We propose the use of two different communication approaches on the area of distributed storage systems for the above operations; Steiner tree approach and multi-shortest path approach. We also analyse these two communication approaches empirically and theoretically. Our theoretical results indicate that Steiner tree approach has lower message usage, whereas, multi-shortest path approach has lower time usage for data reconstruction/repair initialization operations. On the other hand, Steiner tree approach has better message and time metrics for the data update process. Furthermore, our experimental results support these theoretical results. Thus, users can choose between the two approaches depending on their needs and priorities.
Read full abstract