Abstract

Memory bandwidth is one of the major performance bottlenecks for chip multiprocessors (CMPs), which continue to integrate an increasing number of cores with the help of Moore's Law. The growing disparity between the CPU clock rate and offchip memory access speed is known as the Memory Wall. This problem has been actively studied in the past two decades. It is addressed by placing memory closer to the processor, such as stacking the memory directly on top of a CMP, thereby significantly reducing the interconnect latency between them. However, previous 3D-stacked memory architectures use through-silicon via (TSV)based three-dimensional (3-D) integration, which bonds multiple dies with TSVs that have diameters in the 1-5 μm range. Unlike TSV-based 3-D integration, monolithic 3-D integration builds device tiers sequentially on a single substrate. Different tiers are connected using monolithic inter-tier vias (MIVs), which have a diameter (around 50 nm) that is the same as that of a local via. Main memory typically consists of DRAM, which is volatile and thus requires periodic refresh to maintain the stored data. This increases both the energy consumption and access latency. However, various nonvolatile RAMs (NVRAMs) have emerged as possible universal memory technologies, which promise low power, fast read access, high density, and nonvolatility. In this paper, we present an efficient memory interface for monolithic 3D-stacked RAM (both DRAM and NVRAMs such as resistive RAM and nanotube RAM). It takes advantage of the tremendous bandwidth made available by MIVs to implement an on-chip memory bus in order to hide the latency of large data transfers. We propose a multientry row-based write buffer to increase the buffer hit rate and reduce the number of memory core accesses. We decouple read and write accesses using extra interconnects available through MIVs to increase memory throughput. We also present an adaptive power-down policy to maintain balance between energy efficiency and performance. Simulation results show that the proposed architecture can achieve both high performance and energy efficiency, and is thus attractive for low-power/high-performance computing.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call