Abstract

As computer memory increases in size and processors continue to get faster, the memory subsystem becomes an increasing bottleneck to system performance. To mitigate the relatively slow DRAM memory chip speeds, a new generation of 3D stacked DRAM is being developed, with lower power consumption and higher bandwidth. This paper proposes the use of 3D ring-based data fabrics for fast data transfer between these chips. The ring-based data fabric uses a fast standing wave oscillator to clock its transactions. With a fast clocking scheme, and multiple channels sharing the same bus, more channels are utilized while significantly reducing the number of through-silicon vias (TSVs). Experimental results show that our ring-based data fabric can reduce read latencies by almost 4X compared to traditional stacked memory chips. Variations of our scheme can also reduce power consumption compared to traditional memory stacks. Our Memory Architecture using a Ring-based Scheme (MARS) can effectively trade off power, throughput, and latency to improve system performance for different application spaces. We show that our MARS variants can deliver better latency (up to ~4X), power (up to ~8X), and performance per watt (up to ~4X) over HBM, when averaged over 11 SPEC CPU 2006 benchmarks. Other MARS variants provide higher throughput with similar power consumption compared to Wide I/O memory.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.