Abstract

Various techniques have been used in distributed file systems for data availability and stability. Typically, a method for storing data in a replication technique-based distributed file system is used, but due to the problem of space efficiency, an erasure-coding (EC) technique has been utilized more recently. The EC technique improves the space efficiency problem more than the replication technique does. However, the EC technique has various performance degradation factors, such as encoding and decoding and input and output (I/O) degradation. Thus, this study proposes a buffering and combining technique in which various I/O requests that occurred during encoding in an EC-based distributed file system are combined into one and processed. In addition, it proposes four recovery measures (disk input/output load distribution, random block layout, multi-thread-based parallel recovery, and matrix recycle technique) to distribute the disk input/output loads generated during decoding.

Highlights

  • In recent years, big data-based technologies have been studied in various fields, including artificial intelligence, Internet of Things, and cloud computing

  • Hadoop consists of distributed file storage technology and parallel processing technology; only the former is discussed in this study

  • The distributed file storage technology in Hadoop is called Hadoop distributed file system (HDFS), in which a replication technique is used to block data to be stored into a certain size of blocks and replicate and store them [7,8,9]

Read more

Summary

Introduction

Big data-based technologies have been studied in various fields, including artificial intelligence, Internet of Things, and cloud computing. The need for large-scale storage and distributed file systems to store and process big data efficiently has increased [1,2,3]. The main idea of our paper is the input/output buffering and combining technique that combines and processes multiple input/output requests that occur during encoding in an EC-based distributed file system. It is a disk input/output load balancing technique that is a recovery method to distribute the disk input/output load that occurs during decoding.

Related Work
Result
Efficient Data Recovery Method
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.