Abstract

In data centers, hard disks are the most prone to failure of IT equipment. Although there is data backup, data reliability still faces challenges due to hard disks failure. In recent years, many hard disk failure prediction approaches based on SMART data have been proposed. In this paper, we proposed a novel disk failure prediction approach based on Lightgbm algorithm with CID (complexity invariant distance). Our failure prediction model has been built and evaluated on SMART data of about 80,000 hard disks from two manufacturers. The experimental result shows that by adding CID features, the TPR is increased from 0.28 to 0.96, and the number of days that the model can predict failures in advance is extended by 1.2 days. Compared with the several existing failure prediction models, our model has better performance on AUC score, f1-score and TPR.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call