DRAM Row Research Articles

DRAM cells require periodic refreshing to preserve data. In JEDEC DDRx devices, a refresh operation is performed via an auto-refresh command, which refreshes multiple rows in multiple banks simultaneously. The internal implementation of auto-refresh is completely opaque outside the DRAM --- all the memory controller can do is to instruct the DRAM to refresh itself --- the DRAM handles all else, in particular determining which rows in which banks are to be refreshed. This is in conflict with a large body of research on reducing the refresh overhead, in which the memory controller needs fine-grained control over which regions of the memory are refreshed. For example, prior works exploit the fact that a subset of DRAM rows can be refreshed at a slower rate than other rows due to access rate or retention period variations. However, such row-granularity approaches cannot use the standard auto-refresh command, which refreshes an entire batch of rows at once and does not permit skipping of rows. Consequently, prior schemes are forced to use explicit sequences of activate (ACT) and precharge (PRE) operations to mimic row-level refreshing. The drawback is that, compared to using JEDEC's auto-refresh mechanism, using explicit ACT and PRE commands is inefficient, both in terms of performance and power. In this paper, we show that even when skipping a high percentage of refresh operations, existing row-granurality refresh techniques are mostly ineffective due to the inherent efficiency disparity between ACT/PRE and the JEDEC auto-refresh mechanism. We propose a modification to the DRAM that extends its existing control-register access protocol to include the DRAM's internal refresh counter. We also introduce a new "dummy refresh" command that skips refresh operations and simply increments the internal counter. We show that these modifications allow a memory controller to reduce as many refreshes as in prior work, while achieving significant energy and performance advantages by using auto-refresh most of the time.

Power consumption and DRAM latencies are serious concerns in modern chip-multiprocessor (CMP or multi-core) based compute systems. The management of the DRAM row buffer can significantly impact both power consumption and latency. Modern DRAM systems read data from cell arrays and populate a row buffer as large as 8 KB on a memory request. But only a small fraction of these bits are ever returned back to the CPU. This ends up wasting energy and time to read (and subsequently write back) bits which are used rarely. Traditionally, an open-page policy has been used for uni-processor systems and it has worked well because of spatial and temporal locality in the access stream. In future multi-core processors, the possibly independent access streams of each core are interleaved, thus destroying the available locality and significantly under-utilizing the contents of the row buffer. In this work, we attempt to improve row-buffer utilization for future multi-core systems. The schemes presented here are motivated by our observations that a large number of accesses within heavily accessed OS pages are to small, contiguous "chunks" of cache blocks. Thus, the co-location of chunks (from different OS pages) in a row-buffer will improve the overall utilization of the row buffer contents, and consequently reduce memory energy consumption and access time. Such co-location can be achieved in many ways, notably involving a reduction in OS page size and software or hardware assisted migration of data within DRAM. We explore these mechanisms and discuss the trade-offs involved along with energy and performance improvements from each scheme. On average, for applications with room for improvement, our best performing scheme increases performance by 9% (max. 18%) and reduces memory energy consumption by 15% (max. 70%).

DRAM Row Research Articles

Related Topics

Articles published on DRAM Row

Flexible auto-refresh

The dirty-block index

Flipping bits in memory without accessing them

Reducing DRAM row activations with eager read/write clustering

Reducing DRAM row activations with eager read/write clustering

OWL

OWL

LOT-ECC

Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management

Micro-pages

Micro-pages

DDR3 SDRAM with a Complete Predictor

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization

DRAM Controller with a Complate Predictor

DRAM performance as a function of its structure and memory stream locality

Efficient use of memory bandwidth to improve network processor throughput

Designing a modern memory hierarchy with hardware prefetching

Hidden double data transfer scheme for MDL design

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

DRAM Row Research Articles

Related Topics

Articles published on DRAM Row

Flexible auto-refresh

The dirty-block index

Flipping bits in memory without accessing them

Reducing DRAM row activations with eager read/write clustering

Reducing DRAM row activations with eager read/write clustering

OWL

OWL

LOT-ECC

Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management

Micro-pages

Micro-pages

DDR3 SDRAM with a Complete Predictor

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization

DRAM Controller with a Complate Predictor

DRAM performance as a function of its structure and memory stream locality

Efficient use of memory bandwidth to improve network processor throughput

Designing a modern memory hierarchy with hardware prefetching

Hidden double data transfer scheme for MDL design