Abstract

Businesses and individuals move their data to the cloud because fault-tolerant data storage is becoming more important. Currently fault-tolerance cloud storage file systems are available and being used widely. Hadoop Distributed File System (HDFS) has been widely adopted to build cloud storage systems. The default storage policy in cloud file systems has become triplication (triple replication), implemented in the HDFS and many others. Triplication has been favoured because of its ease of implementation, high performance, and reliability. The storage overhead of triplication is a concern, we present the HDFS along with how fault tolerance is achieved by means of erasure coded replication. The placement of the replicas is critical to HDFS reliability and performance, the core concept of the consistent hashing is applied in this work. To evaluate the performance of our HDFS with erasure coded replication scheme, we focus on least storage space consumption and good storage space utilization. We conduct the experiment on original HDFS and HDFS with erasure coded replication. The experimental results show that our scheme can save storage space and utilization is significantly better in erasure coding.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call