Abstract

Training deep neural networks (DNNs) on edge devices has attracted increasing attention in real-world applications for domain adaption and privacy protection. However, deploying DNN training on resource-limited edge devices is challenging as there are massive computations and data transportation in training. To address this issue, we propose an energy-efficient training accelerator in this work by employing a hybrid compression strategy. Here, various data redundancies are fully exploited, and the real triple-side sparsity is achieved. Hence, the computational complexity is drastically reduced with negligible accuracy loss across a range of transfer learning tasks. To facilitate triple-side zero-skipping operations during different training stages, we first present a novel sparse data representation and a triple-sparsity index matching scheme. Second, a sparse tensor processing unit (STPU) arranged in a hierarchical structure is developed, which enables a flexible dataflow to process convolutional (Conv) and fully connected (FC) layers with diverse computational patterns throughout the entire training. Third, an auxiliary processing unit (APU) is designed to execute some postprocessing operations, such as rectified linear unit (ReLU) and on-the-fly pruning. Finally, the training accelerator is implemented under Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm process and evaluated on multiple benchmarks. The experimental results show that THETA achieves 7.28–22.32 tera operations per second (TOPS) and 45.24–133.70 TOPS/W in performance and energy efficiency, reducing 40– <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$72\times $ </tex-math></inline-formula> training time and 19– <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$63\times $ </tex-math></inline-formula> energy consumption over dense training, respectively. Compared with the prior art, our design offers <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.6\times $ </tex-math></inline-formula> throughput and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.9\times $ </tex-math></inline-formula> energy efficiency, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.