Data Management in Erasure-Coded Distributed Storage Systems

Chiniah Aatish,Mungur Avinash

doi:10.1109/ccgrid49817.2020.00018

Abstract

Most data centers around the world (Google, Facebook, Amazon and so on) uses Replication as the major safeguard to provide redundancy in cases of failures, which are quite common in such environment. Diverse research has been done in the area of Replication within data center, spanning from data consistency (quorum/consensus algorithms), degraded reads, data placement (subject to network topology, physical distribution over multiple data centers, etc.), NoSQL functionalities such as leveraging on data locality while executing Hadoop Map-Reduce task. The major drawback of replication is the amount of storage capacity requirement, which is 300% the actual data. Erasure Code is tipped to be the next best alternative to providing redundancy within data centers. And in doing so, a lot of concepts need revisiting. Performance and Bandwidth are major concerns with erasure codes, and there are other open areas that needs adapted solution. For example security and energy-saving. Erasure coding allows for greater flexibility – for instance, by allowing some data center nodes to be switched off, or by allowing for more options to spread load, etc. The high level objective of the PhD is thus to derive a holistic data management solution for erasure coding based distributed storage systems.

Full Text