Abstract

NAND flash memory – ubiquitous in today’s world of smart phones, SSDs (solid state drives), and cloud storage – has a number of well-known reliability problems. NAND data contains bit errors, which require the use of error correcting codes (ECCs). The raw bit error rate (RBER) increases with program-erase (P-E) cycling, and the number of P-E cycles the device can withstand before the RBER exceeds the ECC capability is called its endurance . ECC operates on data stored in a sector of NAND, and there is a large variation in the endurance of sectors within a device and across devices, resulting in excessively conservative endurance specifications. This research shows, for the first time, that a sector’s true endurance can be predicted with remarkable accuracy, using a combination of the sector’s location within the device, and measurements taken at the very beginning of life. Real-world data is gathered on millions of NAND sectors using a custom-built test platform. Optimised machine learning classification models are built from the raw data to predict if a sector will pass or fail to a fixed ECC threshold, after a target P-E cycling level has been reached. A novel technique is demonstrated that uses different ECC thresholds for model training and testing, which allows the models to be tuned so that they never misclassify samples that would fail. This eliminates ECC failures and data loss, allowing simpler, less expensive ECC schemes to be used for modern NAND devices. It also enables significant endurance extensions to be achieved.

Highlights

  • N AND flash is a type of non-volatile memory that has seen an explosion in growth over the last 25 years, as the world’s data storage requirements have grown exponentially

  • The fraction of bits in error is known as the raw bit error rate (RBER), and if the RBER exceeds the capability of the error correcting codes (ECCs) engine, an uncorrectable error occurs

  • The purpose of this paper is to investigate if the RBER associated with each sector at the end of life can be predicted, based on a combination of sector address and measurements taken at the beginning of life

Read more

Summary

INTRODUCTION

N AND flash is a type of non-volatile memory that has seen an explosion in growth over the last 25 years, as the world’s data storage requirements have grown exponentially. Neither program disturb nor read disturb damage flash cells and, as such, do not determine the end-of-life point of a NAND device. Instead, this is determined by the amount of trapped charge in each cell, as a result of P-E cycling. For a given ratio of parity bits to data bits (known as code rate) and a fixed RBER, LDPC has a lower CWER than BCH This increase in ECC performance is counterbalanced by serious challenges associated with the implementation and operation of LDPC [38]. These significant challenges mean that only the most sophisticated and well-resourced integrators of NAND are capable of implementing an LDPC solution

RELATED RESEARCH
RESEARCH OBJECTIVES AND QUESTIONS
RESEARCH QUESTIONS
EXPERIMENTAL DESIGN
DATA COLLECTION
MACHINE LEARNING METHOD
INITIAL PREDICTION MODEL
MODEL IMPROVEMENT
MODEL TUNING
RESULTS AND DISCUSSION
FINAL PREDICTION SYSTEM
Findings
XIII. CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call