Abstract

Erasure codes offer a storage-efficient redundancy mechanism for maintaining data availability guarantees in storage clusters, yet also incur high network traffic consumption and recovery time in failure repair. Exiting studies aim to reduce the recovery time in the heterogeneous network. However, the recovery time is always limited by the link with the low bandwidth between nodes, due to bandwidth heterogeneity. Recently, Bai et al. proposed a parallel pipeline tree technique, called PPT, to reduce recovery time by utilizing a special bandwidth gap to bypass the low-bandwidth link. But we find that PPT’s gap-based bypassing method will cause network congestion and competition. In this paper, we propose SMFRepair, a single-node multi-level forwarding repair technique that uses idle nodes to bypass low-bandwidth links without incurring network congestion and competition. Furthermore, a multi-node scheduling repair technique, called MSRepair, is proposed. MSRepair finds a recovery solution that schedules the parallel repair of multi-node and transfers data from as large-bandwidth links as possible, with the primary objective of minimizing the recovery time. Large-scale Mininet simulation and Amazon EC2 real experiments show that compared to state-of-the-art repair techniques, the single-node recovery time can be reduced by up to 36.65%, and the multi-node recovery time can be reduced by up to 55.10%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.