Abstract
Phase change memory (PCM) is a promising technology that can offer higher capacity than DRAM. Unfortunately, PCM's access latency and energy are higher than DRAM's and its endurance is lower. Many DRAM-PCM hybrid memory systems use DRAM as a cache to PCM, to achieve the low access latency and energy, and high endurance of DRAM, while taking advantage of PCM's large capacity. A key question is what data to cache in DRAM to best exploit the advantages of each technology while avoiding its disadvantages as much as possible. We propose a new caching policy that improves hybrid memory performance and energy efficiency. Our observation is that both DRAM and PCM banks employ row buffers that act as a cache for the most recently accessed memory row. Accesses that are row buffer hits incur similar latencies (and energy consumption) in DRAM and PCM, whereas accesses that are row buffer misses incur longer latencies (and higher energy consumption) in PCM. To exploit this, we devise a policy that avoids accessing in PCM data that frequently causes row buffer misses because such accesses are costly in terms of both latency and energy. Our policy tracks the row buffer miss counts of recently used rows in PCM, and caches in DRAM the rows that are predicted to incur frequent row buffer misses. Our proposed caching policy also takes into account the high write latencies of PCM, in addition to row buffer locality. Compared to a conventional DRAM-PCM hybrid memory system, our row buffer locality-aware caching policy improves system performance by 14% and energy efficiency by 10% on data-intensive server and cloud-type workloads. The proposed policy achieves 31% performance gain over an all-PCM memory system, and comes within 29% of the performance of an allDRAM memory system (not taking PCM's capacity benefit into account) on evaluated workloads.
Highlights
Multiprogrammed workloads on chip multiprocessors require large amounts of main memory to support the working sets of many concurrently executing threads
We propose a Row Buffer Locality-Aware (RBLA) caching mechanism that places in Dynamic Random Access Memory (DRAM) rows which have low row buffer locality, to benefit from the lower array access latency and energy of DRAM compared to Phase Change Memory (PCM)
FREQ-Dyn performs more data migrations to DRAM than RBLA-Dyn (RBLA-Dyn does not migrate frequently accessed data to DRAM unless the data is responsible for frequent row buffer misses), increasing its ability to serve writes in the DRAM cache
Summary
Multiprogrammed workloads on chip multiprocessors require large amounts of main memory to support the working sets of many concurrently executing threads. This memory demand is increasing today as the number of cores on a chip continues to increase and data-intensive applications become more widespread. Cells (memory elements) are typically laid out in arrays of rows (cells sharing a common word line) and columns (cells sharing a common bit line). To read from the array, a word line is first asserted to select a row of cells. Through the bit lines, the selected cells’ contents are detected by sense amplifiers (S/A) and latched in peripheral circuitry known as the row buffer
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.