Minimizing the Update Complexity of Facebook HDFS-RAID Locally Repairable Code

Mehrtash Mehrabi,Majid Khabbazian,Masoud Ardakani

doi:10.1109/vtcfall.2017.8288123

Abstract

Erasure codes are recently used in real-world distributed storage systems (DSSs) such as Google File System,Microsoft Azure Storage, and Facebook HDFS-RAID for data reliability. When designing erasure codes for DSSs, special attention is given to the associated costs of data handling operations such as repair or update. For example, locally repairable codes (LRC) are designed and used in DSSs to allow for low-cost repair of failed blocks. Update complexity (defined as the number of blocks that need to be updated when an information block is changed) is yet another design parameter. This parameter can be seen as a measure of the computation, I/O and networking costs associated with updating an information block in a DSS. Since information is frequently updated by many applications, lowering update complexity can result in lower power consumptions in DSSs. In this work, we study the update complexity of LRCs. Based on our study, we propose an improvement over the LRC used by Facebook HDFS-RAID. Keeping the same code parameters including length, storage overhead, minimum distance and cost of repair (locality), we improve the update complexity by more than 16%. Moreover, we show that with these parameters achieving a lower update complexity is impossible.

Full Text