Abstract

This article presents a deep neural network (DNN) accelerator using an adaptive clocking technique (i.e., elastic clock chain) to exploit the dynamic timing margin for the 2-D processing element (PE) array-based DNN accelerator. To address two major challenges on exploiting dynamic timing margin for modern deep learning accelerators (i.e., diminishing dynamic timing margin on a large array and strong timing dependence on runtime operands), in this work, we proposed an elastic clock chain scheme to provide a flexible multi-domain clock management scheme for in situ compute adaptability. More specifically, a total of 16 clock domains have been created for the 2-D PE array with the clock periods dynamically adjusted based on both runtime instructions and operands. The multidomain clock sources are generated from a multi-phase delaylocked loop (DLL) and delivered by a global clock bus. The clock offsets between neighboring domains are deliberately managed to maintain the synchronization among clock domains. A 16 × 8 PE array that supports different DNN dataflows and bit-precisions was fabricated using a 65-nm CMOS process. The measurement results on MNIST and CIFAR-10 data sets showed that the effective operating frequency was improved by up to 19% for a single instruction multiple data (SIMD) data flow by enabling the operation of the proposed elastic clock chain. The performance improvement was converted into up to 34% energy saving. Compared with SIMD data flow, the systolic dataflow shows reduced performance improvement of up to 11% due to the consideration of all in-flight operand values.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call