The Reach Profiler (REAPER)

Minesh Patel,Jeremie S Kim,Onur Mutlu

doi:10.1145/3140659.3080242

Abstract

Modern DRAM-based systems suffer from significant energy and latency penalties due to conservative DRAM refresh standards. Volatile DRAM cells can retain information across a wide distribution of times ranging from milliseconds to many minutes, but each cell is currently refreshed every 64ms to account for the extreme tail end of the retention time distribution, leading to a high refresh overhead. Due to poor DRAM technology scaling, this problem is expected to get worse in future device generations. Hence, the current approach of refreshing all cells with the worst-case refresh rate must be replaced with a more intelligent design. Many prior works propose reducing the refresh overhead by extending the default refresh interval to a higher value, which we refer to as the target refresh interval, across parts or all of a DRAM chip. These proposals handle the small set of failing cells that cannot retain data throughout the entire extended refresh interval via retention failure mitigation mechanisms (e.g., error correcting codes or bit-repair mechanisms). This set of failing cells is discovered via retention failure profiling, which is currently a brute-force process that writes a set of known data to DRAM, disables refresh and waits for the duration of the target refresh interval, and then checks for retention failures across the DRAM chip. We show that this brute-force approach is too slow and is detrimental to system execution, especially with frequent online profiling. This paper presents reach profiling, a new methodology for retention failure profiling based on the key observation that an overwhelming majority of failing DRAM cells at a target refresh interval fail more reliably at both longer refresh intervals and higher temperatures. Using 368 state-of-the-art LPDDR4 DRAM chips from three major vendors, we conduct a thorough experimental characterization of the complex set of tradeoffs inherent in the profiling process. We identify three key metrics to guide design choices for retention failure profiling and mitigation mechanisms: coverage, false positive rate, and runtime. We propose reach profiling, a new retention failure profiling mechanism whose key idea is to profile failing cells at a longer refresh interval and/or higher temperature relative to the target conditions in order to maximize failure coverage while minimizing the false positive rate and profiling runtime. We thoroughly explore the tradeoffs associated with reach profiling and show that there is significant room for improvement in DRAM retention failure profiling beyond the brute-force approach. We show with experimental data that on average, by profiling at 250ms above the target refresh interval, our first implementation of reach profiling (called REAPER) can attain greater than 99% coverage of failing DRAM cells with less than a 50% false positive rate while running 2.5x faster than the brute-force approach. In addition, our end-to-end evaluations show that REAPER enables significant system performance improvement and DRAM power reduction, outperforming the brute-force approach and enabling high-performance operation at longer refresh intervals that were previously unreasonable to employ due to the high associated profiling overhead.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Reach Profiler (REAPER)

Abstract

Talk to us

Similar Papers

More From: ACM SIGARCH Computer Architecture News

Lead the way for us

Journal: ACM SIGARCH Computer Architecture News	Publication Date: Jun 24, 2017
Citations: 30

Similar Papers

The Reach Profiler (REAPER)
Minesh Patel ... Jeremie S Kim
-
Minesh Patel, et. al.Minesh Patel ... Jeremie S Kim
24 Jun 2017
24 Jun 2017

The efficacy of error mitigation techniques for DRAM retention failures
Samira Khan ... Donghyuk Lee
-
Samira Khan, et. al.Samira Khan ... Donghyuk Lee
16 Jun 2014
16 Jun 2014

The efficacy of error mitigation techniques for DRAM retention failures
Samira Khan ... Onur Mutlu
ACM SIGMETRICS Performance Evaluation Review | VOL. 42
Samira Khan, et. al.Samira Khan ... Onur Mutlu
16 Jun 2014
ACM SIGMETRICS Performance Evaluation Review | VOL. 42

Understanding Latency Variation in Modern DRAM Chips
Kevin K Chang ... Onur Mutlu
-
Kevin K Chang, et. al.Kevin K Chang ... Onur Mutlu
14 Jun 2016
14 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Reach Profiler (REAPER)

Abstract

Talk to us

Similar Papers

More From: ACM SIGARCH Computer Architecture News