Disk Failure Prediction Research Articles

Disk failure has always been a major problem for data centers, leading to data loss. Current disk failure prediction approaches are mostly offline and assume that the disk labels required for training learning models are available and accurate. However, these offline methods are no longer suitable for disk failure prediction tasks in large-scale data centers. Behind this explosive amount of data, most methods do not consider whether it is not easy to get the label values during the training or the obtained label values are not completely accurate. These problems further restrict the development of supervised learning and offline modeling in disk failure prediction. In this article, Active Semi-supervised Learning Disk-failure Prediction ( ASLDP ), a novel disk failure prediction method is proposed, which uses active learning and semi-supervised learning. According to the characteristics of data in the disk lifecycle, ASLDP carries out active learning for those clear labeled samples, which selects valuable samples with the most significant probability uncertainty and eliminates redundancy. For those samples that are unclearly labeled or unlabeled, ASLDP uses semi-supervised learning for pre-labeled by calculating the conditional values of the samples and enhances the generalization ability by active learning. Compared with several state-of-the-art offline and online learning approaches, the results on four realistic datasets from Backblaze and Baidu demonstrate that ASLDP achieves stable failure detection rates of 80–85% with low false alarm rates. In addition, we use a dataset from Alibaba to evaluate the generality of ASLDP . Furthermore, ASLDP can overcome the problem of missing sample labels and data redundancy in large data centers, which are not considered and implemented in all offline learning methods for disk failure prediction to the best of our knowledge. Finally, ASLDP can predict the disk failure 4.9 days in advance with lower overhead and latency.

Read full abstract

Prediction accuracy (true positives, false positives, and so on) is the usual way for evaluating disk-failure prediction models. Realistically however, we aim not only to correctly predict failures, but also to protect data against failure, i.e., we need to take appropriate action after a failure prediction. In the context of storage systems, protecting data requires that we migrate at-risk data, but this consumes network and disk bandwidth, which is particularly problematic for large-scale and cloud systems. This paper consolidates and builds on Li et al. (2016), where we propose using two new metrics, migration rate (MR) and mismigration rate (MMR), to measure the quality of disk failure prediction: MR measures how much at-risk data is migrated (and therefore protected) as a result of correct failure predictions, while MMR measures how much data is migrated needlessly as a result of incorrect failure predictions. In this paper, we additionally propose measuring quality in terms of migration time and mismigration time, which measure the time spent migrating at-risk disks, and the time spent mismigrating healthy disks caused by false alarms, respectively. To demonstrate these metrics’ usefulness, we use them to compare disk-failure prediction methods: we compare: 1) a classification tree (CT) model against a state-of-the-art recurrent neural network (RNN) model and 2) a gradient-boosted regression tree (GBRT) model (which predicts residual life) against RNN. We observe that while RNN performs best in the prediction accuracy experiments, the CT and GBRT models sometimes outperform RNN in the resource-dependent migration-rate experiments. We conclude that prediction accuracy is sometimes misleading: correct predictions do not necessarily imply protected data. We additionally present an improved GBRT model (GBRT+), which offers a practical improvement in disk residual-life prediction accordingly to the newly proposed metrics.

Read full abstract

Disk Failure Prediction Research Articles

Articles published on Disk Failure Prediction

SiaDFP: A Disk Failure Prediction Framework Based on Siamese Neural Network in Large-Scale Data Center

Disk Failure Prediction based on Multi-layer Domain Adaptive Learning

Disk failure prediction based on association analysis and SSA-LSTM

Optimizing Efficiency of Machine Learning Based Hard Disk Failure Prediction by Two-Layer Classification-Based Feature Selection

SPAE: Lifelong disk failure prediction via end-to-end GAN-based anomaly detection with ensemble update

Hard Disk Failure Prediction Based on Blending Ensemble Learning

StreamDFP: A General Stream Mining Framework for Adaptive Disk Failure Prediction

Retracted: Convolution‐LSTM‐Based Mechanical Hard Disk Failure Prediction by Sensoring S.M.A.R.T. Indicators

Fast Proactive Repair in Erasure-Coded Storage: Analysis, Design, and Implementation

A Disk Failure Prediction Method Based on Active Semi-supervised Learning

Spae: Lifelong Disk Failure Prediction Via End-to-End Gan-Based Anomaly Detection with Ensemble Update

Minority Disk Failure Prediction Based on Transfer Learning in Large Data Centers of Heterogeneous Disk Systems

Cost‐efficiency disk failure prediction via threshold‐moving

A disk failure prediction method based on LSTM network due to its individual specificity

Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples

New Metrics for Disk Failure Prediction That Go Beyond Prediction Accuracy

Disk failure prediction model for storage systems based on disk SMART technology

Fatman: Building Reliable Archival Storage Based on Low-Cost Volunteer Resources

Fatman

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Disk Failure Prediction Research Articles

Articles published on Disk Failure Prediction

SiaDFP: A Disk Failure Prediction Framework Based on Siamese Neural Network in Large-Scale Data Center

Disk Failure Prediction based on Multi-layer Domain Adaptive Learning

Disk failure prediction based on association analysis and SSA-LSTM

Optimizing Efficiency of Machine Learning Based Hard Disk Failure Prediction by Two-Layer Classification-Based Feature Selection

SPAE: Lifelong disk failure prediction via end-to-end GAN-based anomaly detection with ensemble update

Hard Disk Failure Prediction Based on Blending Ensemble Learning

StreamDFP: A General Stream Mining Framework for Adaptive Disk Failure Prediction

Retracted: Convolution‐LSTM‐Based Mechanical Hard Disk Failure Prediction by Sensoring S.M.A.R.T. Indicators

Fast Proactive Repair in Erasure-Coded Storage: Analysis, Design, and Implementation

A Disk Failure Prediction Method Based on Active Semi-supervised Learning

Spae: Lifelong Disk Failure Prediction Via End-to-End Gan-Based Anomaly Detection with Ensemble Update

Minority Disk Failure Prediction Based on Transfer Learning in Large Data Centers of Heterogeneous Disk Systems

Cost‐efficiency disk failure prediction via threshold‐moving

A disk failure prediction method based on LSTM network due to its individual specificity

Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples

New Metrics for Disk Failure Prediction That Go Beyond Prediction Accuracy

Disk failure prediction model for storage systems based on disk SMART technology

Fatman: Building Reliable Archival Storage Based on Low-Cost Volunteer Resources

Fatman