Abstract

With the increase in intelligence applications and services, like real-time video surveillance systems, mobile edge computing, and Internet of things (IoT), technology is greatly involved in our daily life. However, the reliability of these systems cannot be always guaranteed due to the hard disk drive (HDD) failures of edge nodes. Specifically, a lot of read/write operations and hazard edge environments make the maintenance work even harder. HDD failure prediction is one of the scalable and low-overhead proactive fault tolerant approaches to improve device reliability. In this paper, we propose an LSTM recurrent neural network-based HDD failure prediction model, which leverages the long temporal dependence feature of the drive health data to improve prediction efficiency. In addition, we design a new health degree evaluation method, which stores current health details and deterioration. The comprehensive experiments on two real-world hard drive datasets demonstrate that the proposed approach achieves a good prediction accuracy with low overhead.

Highlights

  • To improve the accuracy of training sample labeling, we propose a novel health degree evaluation approach which simultaneously considers both the time-sequence features and the drive health status to comprehensively depict the deterioration of drives

  • Datasets. ere are two datasets used in our experiments: one is from the Baidu data center [35] and the other is from the Backblaze storage system [36]. e first dataset has 23,395 enterprise-class hard drives, consisting of 433 failed drives and 22,962 good drives. ese drives are the same model

  • Our prediction model has several parameters to optimize: the number of layers in the LSTMRNN-based model, the size of the sliding window, and the threshold. e results of experiments in this subsection are based on the “Baidu” family as the results in the other families are similar and are limited in length

Read more

Summary

Introduction

SMART is a self-monitoring system used to collect and report various performance indicators of HDDs, which is supported by almost all HDD manufactures [5]. SMART allows up to 30 internal drive attributes such as reallocated sector count (RSC), spin up time (SUT), and seek error rate (SER). Every attribute has five fields, raw data, value, threshold, worst value, and status. E raw data are the values measured by a sensor or a counter. E value is the normalized value of the current raw data; the algorithm for computing the values is defined by HDD manufacturers and is distinct between manufacturers. SMART issues a failure alarm to the user when the value of any attribute exceeds the given threshold at which it becomes a warning

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call