Erasure-Coding-Based Storage and Recovery for Distributed Exascale Storage Systems

Jeong-Joon Kim

doi:10.3390/app11083298

Abstract

Various techniques have been used in distributed file systems for data availability and stability. Typically, a method for storing data in a replication technique-based distributed file system is used, but due to the problem of space efficiency, an erasure-coding (EC) technique has been utilized more recently. The EC technique improves the space efficiency problem more than the replication technique does. However, the EC technique has various performance degradation factors, such as encoding and decoding and input and output (I/O) degradation. Thus, this study proposes a buffering and combining technique in which various I/O requests that occurred during encoding in an EC-based distributed file system are combined into one and processed. In addition, it proposes four recovery measures (disk input/output load distribution, random block layout, multi-thread-based parallel recovery, and matrix recycle technique) to distribute the disk input/output loads generated during decoding.

Highlights

In recent years, big data-based technologies have been studied in various fields, including artificial intelligence, Internet of Things, and cloud computing
Hadoop consists of distributed file storage technology and parallel processing technology; only the former is discussed in this study
The distributed file storage technology in Hadoop is called Hadoop distributed file system (HDFS), in which a replication technique is used to block data to be stored into a certain size of blocks and replicate and store them [7,8,9]

Summary

Introduction

Big data-based technologies have been studied in various fields, including artificial intelligence, Internet of Things, and cloud computing. The need for large-scale storage and distributed file systems to store and process big data efficiently has increased [1,2,3]. The main idea of our paper is the input/output buffering and combining technique that combines and processes multiple input/output requests that occur during encoding in an EC-based distributed file system. It is a disk input/output load balancing technique that is a recovery method to distribute the disk input/output load that occurs during decoding.

Related Work

Result

Efficient Data Recovery Method

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Apr 7, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Erasure-Coding-Based Storage and Recovery for Distributed Exascale Storage Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Efficient techniques of parallel recovery for erasure-coding-based distributed file systems
Dong-Oh Kim ... Hong-Yeon Kim
Computing | VOL. 101
Dong-Oh Kim, et. al.Dong-Oh Kim ... Hong-Yeon Kim
29 Mar 2019
Computing | VOL. 101

Cache-Based Matrix Technology for Efficient Write and Recovery in Erasure Coding Distributed File Systems
Dong-Jin Shin ... Jeong-Joon Kim
Symmetry | VOL. 15
Dong-Jin Shin, et. al.Dong-Jin Shin ... Jeong-Joon Kim
06 Apr 2023
Symmetry | VOL. 15

Fault Tolerance Performance Evaluation of Large-Scale Distributed Storage Systems HDFS and Ceph Case Study
Yehia Arafa ... Atanu Barai
-
Yehia Arafa, et. al.Yehia Arafa ... Atanu Barai
01 Sep 2018
01 Sep 2018

An Investigation on Reed-Solomon Codes as Erasure Coding Technique on Its Properties and Utilizations
Jianning Chen
Highlights in Science, Engineering and Technology | VOL. 81
Jianning ChenJianning Chen
26 Jan 2024
Highlights in Science, Engineering and Technology | VOL. 81

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Erasure-Coding-Based Storage and Recovery for Distributed Exascale Storage Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences