AbstractMany real-world datasets, such as those used for failure and anomaly detection, are severely imbalanced, with a relatively small number of failed instances compared to the number of normal instances. To address these issues, this paper leverages the Backblaze hard disk drives (HDDs) data and makes several contributions to hard drive failure prediction. This research explores 1D convolutional neural networks (CNN) to utilize the sequential nature of hard drive sensor data. The performance of 1D CNN models is compared to traditional machine learning (ML) algorithms, such as the synthetic minority over-sampling technique (SMOTE) and weighted logistic regression (WLR), demonstrating superior results, suggesting the potential effectiveness of the proposed approaches. In addition to these efforts, this paper aims to provide a comprehensive understanding of hard drive longevity and the critical factors contributing to their eventual failure through survival analysis. The 1D CNN models employ weighted binary cross-entropy (WCE) loss and modified focal loss (MFL) functions to manage class imbalanced issues commonly observed in hard drive data. The findings suggest that 1D CNN models outperform traditional ML models, with regularization techniques like dropout and early stopping proving effective in controlling overfitting. Notably, the 1D CNN model with WCE loss demonstrated the best overall performance with a $$G_{mean}$$ G mean of 0.692, successfully maximizing the FDR without increasing the FAR. In parallel, the research also employs Cox regression to identify key SMART parameters influencing drive failure. The high concordance index (c-index) of the Cox model (0.958) adds confidence to the insights derived. The research thus sets a solid groundwork for data center management strategies, with a future focus on practical implementation and evaluation of these findings.
Read full abstract