A pragmatic approach to predict hardware failures in storage systems using MPP database and big data technologies

Rohit Kumar,Senthilkumar Vijayakumar,Syed Azar Ahamed

doi:10.1109/iadcc.2014.6779422

Abstract

A storage system in a data center consists of various components such as Disk Array Enclosure (DAE), disks, processors, servers, hosts running different applications, and so on. Hard disk and server failures are not frequent but are often very costly. Such failures can have a very adverse effect on the business of an organization. The ability to accurately predict an impending disk or server failure can add an essential functionality for designing a reliable, fault tolerant and continuously available storage system. This paper explains a novel approach to predict hardware failures using spectrum-kernel Parallel Support Vector Machine (Parallel SVM) method by analyzing the system events logged in the system log files. These log files not only records the events processed by the system but it also holds the messages as the system state changes. A single message in the system log file is insufficient for any prediction and such prediction is bound to be less accurate. The approach introduced in the paper uses a sequence or pattern of messages from the system log file using a Sliding Window of messages with window size of 5 message sequence to predict the likelihood of a failure. These Sliding Windows of message sequences acts as inputs to the Parallel SVM. The Parallel SVM further tags the messages to a failure or non-failure system. Data Mining techniques are used in extracting useful information from the raw dataset. A solutioning model is developed using the structured dataset and Machine Learning algorithms. This environment when implemented using actual system logs from Linux-based storage system have shown to predict a hardware failure with accuracy of 90-92 percent.

Full Text