Abstract

Deep Neural Networks (DNNs) have been widely used in various artificial intelligence (AI) applications due to their overwhelming performance. Furthermore, recently, several algorithms have been reported that require on-device training to deliver higher performance in real-world environments and protect users’ personal data. However, edge/mobile devices contain only limited computation capability with battery power, so an energy-efficient DNN training processor is necessary to realize on-device training. Although there are a lot of surveys on energy-efficient DNN inference hardware, the training is quite different from the inference. Therefore, analysis and optimization techniques targeting DNN training are required. This article aims to provide an overview of energy-efficient DNN processing that enables on-device training. Specifically, it will provide hardware optimization techniques to overcomes the design challenges in terms of distinct dataflow, external memory access, and computation. In addition, this paper summarizes key schemes of recent energy-efficient DNN training ASICs. Moreover, we will also show a design example of DNN training ASIC with energy-efficient optimization techniques.

Highlights

  • D EEP neural network (DNN) [1] has been widely studied across all technology domains due to the superior accuracy in various applications such as computer vision [2]–[11], natural language processing (NLP) [12], [13], and autonomous system [14]–[16]

  • DNN training requires a significant amount of operations, so user’s edge/mobile devices had provided only inference with downloaded DNN parameters which pre-trained on cloud servers

  • References [17]–[19] proposed DNN training scheme using private dataset stored on user devices

Read more

Summary

INTRODUCTION

D EEP neural network (DNN) [1] has been widely studied across all technology domains due to the superior accuracy in various applications such as computer vision [2]–[11], natural language processing (NLP) [12], [13], and autonomous system [14]–[16]. DNN training iteratively processes three distinct steps to find high accuracy model parameters, inducing a large number of operations, external memory access, and various dataflow. It makes the realization of ondevice DNN training very challenging since edge devices contain only limited computation capability with battery power. There are many studies about high energy-efficient DNN inference hardware optimizing memory access and computation, DNN training is quite different from DNN inference. We analyze three design challenges for energy-efficient DNN training: dataflow, external memory access, and computation. We introduce the optimization techniques of recent DNN training hardware and describe an example of an energy-efficient DNN training ASIC design.

HW DESIGN CHALLENGES FOR DNN TRAINING
DNN TRAINING ASIC DESIGN EXAMPLE
SUMMARY OF DNN TRAINING ASICS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call