Abstract

The exponentially increasing performance of chip multiprocessors (CMPs) predicted by Moore's Law is no longer due to the increasing clock rate of a single CPU core, but on account of the increase of core counts in the CMP. More transistors are integrated within the same footprint area as the technology node shrinks to deliver higher performance. However, this is accompanied by higher power dissipation that usually exceeds the coping capability of inexpensive cooling techniques. This Power Wall prevents the chip from running at full speed with all the devices powered-on. This is known as the dark silicon problem. Another major bottleneck in CMP development is the imbalance between the CPU clock rate and memory access speed. This Memory Wall keeps the CPU from fully utilizing its compute power. To address both the Power and Memory Walls, we propose a monolithic 3D hybrid architecture that consists of a multi-core CPU tier, a fine-grain dynamically reconfigurable (FDR) field-programmable gate array (FPGA) tier, and multiple resistive RAM (RRAM) tiers. The FDR tier is used as an accelerator. It uses the concept of temporal logic folding to localize on-chip communication. The RRAM tiers are connected to the CPU and FDR tiers through an efficient memory interface that takes advantage of the tremendous bandwidth available from monolithic inter-tier vias and hides the latency of large data transfers. We evaluate the architecture on two types of benchmarks: compute-intensive and memory-intensive. We show that the architecture reduces both power and energy significantly at a better performance for both types of applications. Compared to the baseline, our architecture achieves an average of 43.1× and 2.5× speedup on compute-intensive and memory-intensive benchmarks, respectively. The power and energy consumption are reduced by 5.0× and 40.5×, respectively, for compute-intensive applications, and 2.0× and 4.2×, respectively, for memory-intensive applications. This translates to 1745.3× energy-delay product (EDP) improvement for compute-intensive applications and 10.5× for memory-intensive applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.