ESet: Placing Data Towards Efficient Recovery for Large-Scale Erasure-Coded Storage Systems

Chengjian Liu,Hai Liu,Yiu-Wing Leung,Xiaowen Chu

doi:10.1109/icccn.2016.7568521

Abstract

Erasure coding has been extensively deployed in distributed storage systems to ensure high reliability and low storage overhead. However, erasure coding requires much more disk I/O to recover a damaged data block than replication does, resulting in very long data recovery time. Data placement algorithm can be tailored to speed up data recovery process by exploiting I/O parallelism. However, existing algorithms that obtain good I/O parallelism for replication can not directly work with erasure-coded storage systems; and other algorithms for both replication based and erasure-coded storage systems overlook the importance of recovery I/O parallelism, which may jeopardize the service quality and reliability of these systems. In this paper, we present a data placement strategy named ESet which brings recovery efficiency for each host in a distributed storage system. We define a configurable parameter named overlapping factor for system administrator to easily achieve desirable recovery I/O parallelism. Our simulation results show that ESet can significantly improve the data recovery performance without violating the reliability requirement by distributing data and code blocks across different failure domains.

Full Text