NAND flash memory has gained popularity in a wide variety of digital storage systems. Although with excellent performance, NAND flash memory suffers various reliability problems. In recent years, researchers try to predict flash failure by using machine-learning models. However, the application of machine-learning based failure prediction method faces the following problems: imbalance between robustness and portability. When applying on different flash chips, the performance of prediction model degrades with the variation of error characteristics. In order to adapt to the variation, the machine-learning model needs to be re-built to ensure performance of failure prediction. The overheads of re-building model result in challenges when adjusting prediction model to adapt to the variation of error characteristics. To overcome these challenges, we present LightWarner, an easily applicable predictor based on model-free Reinforcement learning algorithms. LightWarner learns error characteristics dynamically during flash lifetime without pre-training. We evaluate the performance of LightWarner on six types of 3D flash chips. The evaluation result shows that LightWarner achieves over 93% F1 score on different flash chips, which is about 10% higher than supervised machine learning methods. And LightWarner can adapt to the variation of error characteristics with low migration costs.
Read full abstract