Abstract

Failure of hard disk drives (HDDs) is the most critical reliability issue of data center. Therefore, predicting the failure of the HDD is an important means to ensure the storage security of the data center. However, most current research works had not paid attention to the fact that the self-monitoring, analysis and reporting technology (SMART) data in a returned failed HDD are a long-term sequence that consists of many unlabeled data, as the healthy and faulty data are highly mixed. Because the failure data in the rapid degradation period are less than the health data in the normal state, the mixture of healthy and faulty data results in an extremely data imbalance. This brings a great challenge to find the hidden fault information, and thus failure prediction becomes a difficult task. To cope with the above problems, a multi-instance long-term data classification method based on long short-term memory (LSTM) network and attention mechanism are proposed to predict the failure of HDDs. Regarding long time sequence HDD data as an instance bag, multi-instance learning (MIL) divides it into multiple instances in the subconcept layer, and then studies the connection between instances and bag labels. Based on the analysis of HDD data of a communication company and Backblaze data center, our proposed method can obtain much better results than other methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call