Abstract

Hadoop distributed computing clusters are used worldwide for high-performance computations. Often various hardware and software faults occur, leading to both data and computation time losses. This paper proposes the usage of a fault prediction software called 'Zedacross' which uses machine learning principles combined with cluster monitoring tools. Firstly, the paper suggests a model that uses the resource usage statistics of a normally functioning Hadoop cluster to create a machine learning model that can then be used to predict and detect faults in real time. Secondly, the paper explains the novel idea of using higher order differentials as inputs to SVM for highly accurate fault predictions. Predictions of system faults by observing system resource usage statistics in real-time with minimum delay will play a vital role in deciding the need for job rescheduling tasks or even dynamic up-scaling of the cluster. To demonstrate the effectiveness of the design a Java utility was built to perform cluster fault monitoring. The results obtained after running the system on various test cases demonstrate that the proposed method is accurate and effective.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call