Deep neural networks (DNNs) are revolutionizing machine learning, with unprecedented accuracy on many AI tasks. Energy-efficient neural acceleration is crucial in broadening DNN applications in cloud and mobile end devices. However, power-hungry clock networks limit the energy-efficiency of DNN accelerators. In this work, we propose a novel DNN hardware accelerator, called the asynchronous neural network processor (AsNNP). At the heart of AsNNP is a scalable hierarchy matrix multiply unit, with bit-serial processing elements working in parallel. It replaces the global clock networks with asynchronous handshake protocols to realize the synchronization and communication between each part, minimizing the dynamic power. Meanwhile, a fine-grain asynchronous pipeline based on weak-conditioned half-buffer (WCHB) is introduced to pipe successive computations in a data-driven manner, i.e., once data arrives computation begins, maximizing the throughput. These techniques enable AsNNP to work in a fully data-driven asynchronous communication fashion with optimized energy-efficiency. The proposed accelerator is implemented with quasi-delay-insensitive (QDI) clockless logic family and evaluated in a 65 nm process. Compared with the synchronous baseline, simulation results show that AsNNP offers 2.2× higher equivalent frequency and 1.59× lower power. Compared with state-of-the-art DNN accelerators, AsNNP shows 1.17×-4.97× energy-efficiency improvement.
Read full abstract