Abstract

In recent years, with the development of flash memory technology, storage systems in large data centers are typically built upon thousands or even millions of solid-state drives (SSDs). Therefore, the failure of SSDs is inevitable. An SSD failure may cause unrecoverable data loss or unavailable system service, resulting in catastrophic results. Active fault detection technologies are able to detect device problems in advance, so it is gaining popularity. Recent trends have turned toward applying AI algorithms based on SSD SMART data for fault detection. However, SMART data of new SSDs contains a large number of features, and the high dimension of data features results in poor accuracy of AI algorithms for fault detection. To tackle the above problems, we improve the structure of traditional Auto-Encoder (AE) based on GRU and propose an SSD fault detection method – GAL based on dimensionality reduction with Gated Recurrent Unit (GRU) sparse autoencoder (GRUAE) by combining the temporal characteristics of SSD SMART data. The proposed method trains the GRUAE model with SSD SMART data firstly, and then adopts the encoder of GRUAE model as the dimensionality reduction tool to reduce the original high-dimensional SSD SMART data, aiming at reducing the influence of noise features in original SSD SAMRT data and highlight the features more relevant to data characteristics to improve the accuracy of fault detection. Finally, LSTM is adopted for fault detection with low-dimensional SSD SMART data. Experimental results on real SSD dataset from Alibaba show that the fault detection accuracy of various AI algorithms can be improved by varying degrees after dimensionality reduction with the proposed method, and GAL performs best among all methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call