Abstract

With scaling technology, emerging nonvolatile devices, and data-intensive applications, memory faults have become a major reliability concern for computing systems. With various hardware and software approaches proposed to address this issue, a comprehensive evaluation is required to understand the effectiveness of these solutions. Considering the complex nature of various memory faults as well as interactions between various correction mechanisms, we propose MEMRES, a fast main memory system reliability simulator. It enables memory fault simulation with error-correcting code (ECC) algorithms and modern memory reliability management, including memory page retirement, mirroring, scrubbing, and hardware sparing. MEMRES is computationally efficient in obtaining memory failure probabilities in the presence of multiple failure mechanisms and complex correction scheme, allowing the optimization of memory system reliability, the prediction of emerging memory reliability, and designing a reliability enhancement technique. The accuracy of MEMRES is verified by an existing analytical model and an existing memory fault simulator. We performed a case study on spin-transfer torque random access memory (STT-RAM)-based main memory, and the results indicate that in-memory ECC can significantly mitigate the write error rate of STT-RAM, demonstrating the capability of handling emerging memory system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call