Modern applications require hardware accelerators to maintain energy efficiency while satisfying the increasing computation requirements. However, with evolving standards and rapidly changing algorithmic complexity as well as rising design costs at advanced technology nodes, the iterative development of inflexible accelerators for such applications becomes ineffective. The reconfigurable architectures can provide a higher throughput and the required flexibility, but with substantial energy and area efficiency overhead relative to the accelerators. We develop a domain-specific, energy- and area-efficient (within 2× - 10× of accelerators) multiprogram runtime reconfigurable accelerator called the universal digital signal processor (UDSP). The design maximizes generality and resource utilization for signal processing and linear algebra with minimal area and energy penalty. The statistics-driven multilayer network minimizes the network delay and consists of an optimized switchbox design that maximizes the connectivity per hardware cost. The multilayered interconnect network is linearly scalable with the number of processing elements and allows for intradielet and multidielet scaling. The network features’ deterministic routing and timing for fast program compile and its translation and rotation symmetries allow for hardware resource reallocation. Multidielet scaling is enabled by energy-efficient, high-bandwidth, and high-density interdielet communication channels that seamlessly extend the intradielet routing network across the dielet boundaries using 10- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mu \text{m}$ </tex-math></inline-formula> fine-pitch silicon interconnect fabric (Si-IF) interposer. A 2 × 2 multidielet UDSP on Si-IF can achieve a peak energy efficiency of 785 GMACs/J at 0.42 V and 315 MHz. The interdielet communication channel, SNR-10, provides a shoreline bandwidth density of 297 Gb/s/mm at 1.1 Gb/s/pin at 0.8 V and nominal energy efficiency of 0.38 pJ/bit.
Read full abstract