Abstract

Integrating cyber, physical, and social spaces together, cyber-physical-social systems (CPSS) bring more conveniences to humans. For practical applications and user convenience, it is essential that the Big Data produced in CPSS be stored in the distributed storage systems of CPSS. In this paper, we study the fault tolerance scheme for distributed storage systems of CPSS, and propose a framework that can recover multiple failed nodes simultaneously. Considering the reliability of storage nodes in distributed storage systems, the research on locally repairable codes has mostly focused on repairing failed nodes within each repair group. However, when entire repair groups have failed, existing locally repairable codes cannot repair more than one failed group. In this paper, local codes with cooperative repair that can recover more than one failed group are proposed. Specifically, the proposed local codes are constructed based on minimum storage regenerating (MSR) codes, and have an interleaving structure among the local codes, so that the parity symbols of any local code can be generated from the MSR codes in its two adjacent local codes. Taking advantage of this property, more than one failed local group can be repaired cooperatively by their adjacent local groups with lower repair locality. Furthermore, the key parameters of local codes with cooperative repair are derived. Theoretical analysis and simulation results show that, compared with previous codes with local regeneration, our codes have higher bandwidth overhead when repairing failed nodes, but advantages in storage overhead and repair locality either for repair of a single failed node or one failed local group. Moreover for a single failed local group, local codes with cooperative repair achieve almost the same tradeoff curve of storage overhead and bandwidth overhead as MSR-local codes and minimum bandwidth regenerating local (MBR-local) codes.

Highlights

  • Since the traditional cloud storage model runs in a centralized storage manner, single node of failure may lead to the collapse of system

  • We present an explicit construction of local codes with cooperative repair (LCCR)

  • Involving a complex hyperspace of cyber, physical, and social spaces, cyber-physical-social systems (CPSS) have seen significant adoption in the past few years, which are secure by design and exemplify a distributed computing system with high fault tolerance

Read more

Summary

INTRODUCTION

Since the traditional cloud storage model runs in a centralized storage manner, single node of failure may lead to the collapse of system. MSR-local codes and MBR-local codes proposed by Kamath et al [15] have the ability to repair the failed local codes, even though their initial purpose was to simplify node repair. Based on their construction, MSR-local codes and MBR-local. We present an explicit construction of local codes with cooperative repair (LCCR). Theoretical and numerical analyses show that, the proposed LCCR have performance improvement in repair locality for node failures and local group failures, and achieve almost the same tradeoff curve of storage overhead and bandwidth overhead as MSR-local codes and MBR-local codes for a single failed local group.

BACKGROUND
MSR-LOCAL CODES AND MBR-LOCAL CODES
PERFORMANCE COMPARISON
STORAGE OVERHEAD
REPAIR OF A SINGLE FAILED NODE
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call