Peer-Comparison Based Fault Diagnosis for Hadoop Systems

Yue Gao Tang,Li Miao,Feng Ping Chen

doi:10.4028/www.scientific.net/amm.621.235

Abstract

In the age of big data, MapReduce is developed as an important tool to process massive datasets in a parallel way on cluster and Hadoop is an open-source implementation of it. However, with the increasing size of clusters, it becomes more and more difficult to identify and diagnose faulty nodes, especially those continuing running but with degraded performance. Then, based on an observation that the behaviors of all nodes in the cluster are relatively similar, we propose a peer-comparison approach that can automatically diagnose performance problems in Hadoop cluster through extracting, analyzing both Hadoop logs and OS-level performance metrics on each node. Compared with previous works, our approach is more scalable and effective and can pinpoint the underlying bug of faulty node in Hadoop clusters.

Full Text