Abstract

The Hadoop platform provides a powerful software framework for distributed storage and processing of massive amounts of data. It is at the heart of big data processing and has found numerous applications in diverse areas, ranging from environmental monitoring to security analysis. To facilitate the storage and processing of big data, a Hadoop platform typically runs on a cluster of servers and may scale up to process big data over thousands of hardware nodes. However, the growing scale and complexity of the Hadoop platform also make it increasingly challenging to manage and operate. In this paper, we present a framework called LogM that leverages not only the deep learning model, but also the knowledge graph technology for failure prediction and analysis of the Hadoop cluster. In particular, we first develop a CAB net ( C onvolutional Neural Network (CNN) with a ttention-based B i-directional Long Short Term Memory (Bi-LSTM)) architecture to effectively learn the temporal dynamics from the sequential log data, which allows us to predict system failures. We then adopt a knowledge graph approach for failure analysis and diagnosis. Extensive experiments have been carried out to assess the performance of the proposed approach. It is seen that LogM is highly effective in predicting and diagnosing system failures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call