Abstract

AbstractMany real-world datasets, such as those used for failure and anomaly detection, are severely imbalanced, with a relatively small number of failed instances compared to the number of normal instances. To address these issues, this paper leverages the Backblaze hard disk drives (HDDs) data and makes several contributions to hard drive failure prediction. This research explores 1D convolutional neural networks (CNN) to utilize the sequential nature of hard drive sensor data. The performance of 1D CNN models is compared to traditional machine learning (ML) algorithms, such as the synthetic minority over-sampling technique (SMOTE) and weighted logistic regression (WLR), demonstrating superior results, suggesting the potential effectiveness of the proposed approaches. In addition to these efforts, this paper aims to provide a comprehensive understanding of hard drive longevity and the critical factors contributing to their eventual failure through survival analysis. The 1D CNN models employ weighted binary cross-entropy (WCE) loss and modified focal loss (MFL) functions to manage class imbalanced issues commonly observed in hard drive data. The findings suggest that 1D CNN models outperform traditional ML models, with regularization techniques like dropout and early stopping proving effective in controlling overfitting. Notably, the 1D CNN model with WCE loss demonstrated the best overall performance with a $$G_{mean}$$ G mean of 0.692, successfully maximizing the FDR without increasing the FAR. In parallel, the research also employs Cox regression to identify key SMART parameters influencing drive failure. The high concordance index (c-index) of the Cox model (0.958) adds confidence to the insights derived. The research thus sets a solid groundwork for data center management strategies, with a future focus on practical implementation and evaluation of these findings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.