Abstract

In the age of big data, MapReduce is developed as an important tool to process massive datasets in a parallel way on cluster and Hadoop is an open-source implementation of it. However, with the increasing size of clusters, it becomes more and more difficult to identify and diagnose faulty nodes, especially those continuing running but with degraded performance. Then, based on an observation that the behaviors of all nodes in the cluster are relatively similar, we propose a peer-comparison approach that can automatically diagnose performance problems in Hadoop cluster through extracting, analyzing both Hadoop logs and OS-level performance metrics on each node. Compared with previous works, our approach is more scalable and effective and can pinpoint the underlying bug of faulty node in Hadoop clusters.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call