Explicit Minimum Storage Regenerating Codes

Zhiying Wang,Itzhak Tamo,Jehoshua Bruck

doi:10.1109/tit.2016.2553675

Abstract

In distributed storage, a file is stored in a set of nodes and protected by erasure-correcting codes. Regenerating code is a type of code with two properties: first, it can reconstruct the entire file in the presence of any r node erasures for some specified integer r; second, it can efficiently repair an erased node from any subset of remaining nodes with a given size. In the repair process, the amount of information transmitted from each node normalized by the storage size per node is termed repair bandwidth (fraction). When the storage size per node is minimized, the repair bandwidth is lower bounded by 1/r, where r is the number of parity nodes. A code attaining this lower bound is said to have optimal repair. We consider codes with minimum storage size per node and optimal repair, called minimum storage regenerating (MSR) codes. In particular, if an MSR code has r parities and any r erasures occur, then by transmitting all the information from the remaining nodes, the original file can be reconstructed. On the other hand, if only one erasure occurs, only a fraction of 1/r of the information in each remaining node needs to be transmitted. If we view each node as a vector or a column over some field, then the code forms a 2-D array. Given the length of the column l and the number of parities r, we explicitly construct the high-rate MSR codes. The number of systematic nodes of our construction is (r + 1) log <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r</sub> l, which is longer than previously known results. Besides, we construct the MSR codes with other desirable properties: first, the codes with low complexity when the information is updated, and second, the codes with low access or storage node I/O cost during repair.

Full Text