An Energy-Efficient Sparse Deep-Neural-Network Learning Accelerator With Fine-Grained Mixed Precision of FP8–FP16

Jinsu Lee,Gwangtae Park,Jinmook Lee,Donghyeon Han,Hoi-Jun Yoo,Juhyoung Lee

doi:10.1109/lssc.2019.2937440

Abstract

Recently, several hardware have been reported for deep-neural-network (DNN) acceleration, however, they focused on only inference rather than DNN learning that is crucial ingredient for user adaptation at the edge-device as well as transfer learning with domain-specific data. However, DNN learning requires much heavier floating-point (FP) computation and memory access than DNN inference, thus, dedicated DNN learning hardware is essential. In this letter, we present an energy-efficient DNN learning accelerator core supporting CNN and FC learning as well as inference with following three key features: 1) fine-grained mixed precision (FGMP); 2) compressed sparse DNN learning/inference; and 3) input load balancer. As a result, energy efficiency is improved $1.76\times $ compared to sparse FP16 operation without any degradation of learning accuracy. The energy efficiency is $4.9\times $ higher than NVIDIA V100 GPU and its normalized peak performance is $3.47\times $ higher than previous DNN learning processor.

Full Text