Abstract

Deep Neural Networks (DNNs) have shown significant advantages in many domains, such as pattern recognition, prediction, and control optimization. The edge computing demand in the Internet-of-Things (IoTs) era has motivated many kinds of computing platforms to accelerate DNN operations. However, due to the massive parallel processing, the performance of the current large-scale artificial neural network is often limited by the huge communication overheads and storage requirements. As a result, efficient interconnection and data movement mechanisms for future on-chip artificial intelligence (AI) accelerators are worthy of study. Currently, a large body of research aims to find an efficient on-chip interconnection to achieve low-power and high-bandwidth DNN computing. This paper provides a comprehensive investigation of the recent advances in efficient on-chip interconnection and design methodology of the DNN accelerator design. First, we provide an overview of the different interconnection methods on the DNN accelerator. Then, the interconnection methods on the non-ASIC DNN accelerator will be discussed. On the other hand, with the flexible interconnection, the DNN accelerator can support different computing flow, which increases the computing flexibility. With this motivation, reconfigurable DNN computing with flexible on-chip interconnection will be investigated in this paper. Finally, we investigate the emerging interconnection technologies (e.g., in/near-memory processing) for the DNN accelerator design. This paper systematically investigates the interconnection networks in modern DNN accelerator designs. With this article, the readers are able to: 1) understand the interconnection design for DNN accelerators; 2) evaluate DNNs with different on-chip interconnection; 3) familiarize with the trade-offs under different interconnections.

Highlights

  • T HE Internet of Things (IoT) trend drives AI technology

  • DNN is composed of a large number of neurons which are arranged in layers, called input layer, hidden layers, and output layer

  • The outputs of one layer become the inputs of the layer, until the result is obtained in the output layer

Read more

Summary

INTRODUCTION

T HE Internet of Things (IoT) trend drives AI technology. The notable benefits of AI have led to the advancement in many real-world applications, such as speech recognition and image classification [1]. While accuracy is important for inference too, it is a common practice in some applications to trade-off accuracy for more throughput or lower latency [3], [4] Memory requirement is another difference between training and inference. Current large-scale DNNs, involve complex communication, extensive computations, and storage requirements, which are beyond the capability of current resource-constraint embedded devices based on generalpurpose CPU and GPU processing elements This has led to recent growing popularity in developing domain-specific resource-constraint platforms with dedicated processing, memory, and communication resources for DNN computation [6]. Definition Artificial Intelligence Artificial Neural Network Application Specific Integrated Circuits Convolutional Neural Network Deep Neural Network First-in First-out Field Programmable Gate Array Graphical Processing Unit Internet of Things Instruction Set Architecture Lookup Table Multiply-Accumulation Network on Chip Phase-Change Memory Processing Element Resistive Random-Access Memory Through-Silicon-Via interconnection methods on DNN operations according to different design scenarios.

INTERCONNECTS IN ASIC NN ACCELERATORS
Array-based Interconnection in NN Accelerators
Mesh-based Interconnection in NN Accelerators
Non-mesh-based Interconnects
Reconfigurable Interconnects
INTERCONNECTS IN NON-ASIC NN ACCELERATORS
FPGA-based NN Operation
NN Operations on GPGPU
NN Operations on Manycore
NN Operations on Embedded Processors
INTERCONNECTS AND EMERGING TECHNOLOGIES
In-Memory and Near-Memory Processing
Wireless Interconnects
Optical Interconnects
FUTURE RESEARCH
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call