Abstract

Recently, several hardware have been reported for deep-neural-network (DNN) acceleration, however, they focused on only inference rather than DNN learning that is crucial ingredient for user adaptation at the edge-device as well as transfer learning with domain-specific data. However, DNN learning requires much heavier floating-point (FP) computation and memory access than DNN inference, thus, dedicated DNN learning hardware is essential. In this letter, we present an energy-efficient DNN learning accelerator core supporting CNN and FC learning as well as inference with following three key features: 1) fine-grained mixed precision (FGMP); 2) compressed sparse DNN learning/inference; and 3) input load balancer. As a result, energy efficiency is improved $1.76\times $ compared to sparse FP16 operation without any degradation of learning accuracy. The energy efficiency is $4.9\times $ higher than NVIDIA V100 GPU and its normalized peak performance is $3.47\times $ higher than previous DNN learning processor.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.