Repair Strategies for Mobile Storage Systems

Gokhan Calis,O Ozan Koyluoglu,Swetha Shivaramaiah,Loukas Lazos

doi:10.1109/tcc.2019.2914436

Gokhan Calis, O Ozan Koyluoglu + Show 2 more

Open Access

https://doi.org/10.1109/tcc.2019.2914436

Copy DOI

Journal: IEEE Transactions on Cloud Computing	Publication Date: Oct 1, 2021
Citations: 3	License type: publisher-specific-oa

Affiliation: University of Arizona

Abstract

We study the data reliability problem for devices forming a dynamic distributed storage system. Such systems are commonplace in traditional cloud storage applications where storage node failures and updates are frequent. We consider the application of regenerating codes for file maintenance. Such codes require lower bandwidth to regenerate lost data fragments compared to file replication or reconstruction. We investigate threshold-based repair strategies where data repair is initiated after a threshold number of data fragments have been lost. We show that at a low departure-to-repair rate regime, in which repairs are initiated after several nodes have left the system outperforms if repairs are initiated after a single node departure. This optimality is reversed when the node turnover is high. We further compare distributed and centralized repair strategies and derive the optimal repair threshold for minimizing the average repair cost per unit of time. In addition, we examine cooperative repair strategies and show performance improvements. We investigate several models for the time needed for node repair including a simple fixed time model and a more realistic model that takes into account the number of repaired nodes. Finally, an extended model where additional failures are allowed during the repair process is investigated.

Full Text