Abstract

To keep pushing Moores law cadence and improve integrated circuits area, delay, and power, novel fabrication schemes such as parallel and monolithic 3D integration have been recently proposed. While parallel 3D does not enable very fine-grained vertical connections, monolithic 3D currently only offers a limited number of transistor tiers due to the high cost of the additional masks and processing steps. In our previous work, we introduced a novel 3D integration scheme called 3D Nanofabric. Inspired by the 3D NAND flash process, the flow consists of N identical vertical tiers where multiple vertical layers can be patterned at once, reducing the manufacturing cost significantly. In this paper, we propose to use our 3D Nanofabric flow to design low-footprint Multiply-And-Accumulate units (MAC). As a MAC unit can be designed using a regular array organization, we show how it can be spread across multiple vertical layers using the 3D Nanofabric flow, while respecting the different layout constraints. Through circuit-level evaluations, we show that for a 64-input bit multiplier, the area and area-delay-product are decreased by 21.0x and 16.7x, respectively, compared to a traditional 2D implementation using a 28nm FDSOI technology, with only a 43% energy overhead. Additionally, we show how to build a systolic 3D MAC array aimed at convolutional neural networks. Through architectural evaluations, we demonstrate that when running VGG-16, our 3D MAC array can improve the TOPs/mm2 by 2.8x compared to a TPU-like 2D systolic array.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call