Abstract

Chip kill correct is an advanced type of error correction used in memory sub-systems. Existing analytical approaches for modeling the reliability of memory sub-systems with chipkillcorrect are limited to those with chip kill-correct solutions that guarantee correction of errors in a single DRAM device. However, stronger chip kill correct solutions that are capable of guaranteeing the detection and even correction of errors in up to two DRAM devices have become common in existing HPC systems. Analytical reliability models are needed for such memory subsystems. This paper proposes analytical models for the reliability of double-chipkill detect and/or correct. Validation against Monte Carlo simulations shows that the output of our analytical models are within 3.9% of Monte Carlo simulations, on average. We used the analytical models to study various aspects of the reliability of memory sub-systems protected by double-chip kill detect and/or correct. Our studies provide several insights into the dependence of reliability of these systems on scale, device fault rate, memory organization, and memory-scrubbing policy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.