An Adaptive RPC Mechanism for Performance and Node Fault Tolerance Optimization in HDFS

Jingyu Zhang,Zhengwen Shu,Lailong Luo

doi:10.1109/hpcc-dss-smartcity-dependsys57074.2022.00270

Jingyu Zhang, Zhengwen Shu + Show 1 more

https://doi.org/10.1109/hpcc-dss-smartcity-dependsys57074.2022.00270

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

With the rapid development and evolution of information technologies, the big data industry has been experiencing exponential explosive data growth. Since the huge business and research value behind the large-scale data, big data technology has become one of the hotest research fields in academia and industry. To the big data, storing and processing the data is core function of the file systems. Hadoop Distributed File System (HDFS) is the most typical distributed big data architecture, which has the characteristics of high reliability, high fault tolerance, and low hardware cost. HDFS provides the efficient task implementation and fault tolerance among nodes by Remote Procedure Call (RPC) mechanism. However, the traditional RPC mechanism has remarkable defects with the fixed timing method, and cannot quickly establish the relationship between node downtime and network congestion. In this paper, we propose an improved adaptive RPC mechanism for node fault tolerance and performance. The proposed method classifies nodes by the data block access statistic, and dynamically adjusts the RPC interval time. This method can be used to reduce the networking traffic and processing pressure of NameNode nodes, and improve the I/O and task performance. Through the node classification, an inrack method is used to achieve better performance on fault tolerance. Eventually, we design the extensive experiments to evaluate the proposed method, and experimental results show it improves the performance compared with state-of-the-art method.

Full Text