Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications

Deepak Kadetotad,Sairam Arunachalam,Jae-Sun Seo,Chaitali Chakrabarti

doi:10.1145/2966986.2967028

Abstract

Recent breakthroughs in deep neural networks have led to the proliferation of its use in image and speech applications. Conventional deep neural networks (DNNs) are fully-connected multi-layer networks with hundreds or thousands of neurons in each layer. Such a network requires a very large weight memory to store the connectivity between neurons. In this paper, we propose a hardware-centric methodology to design low power neural networks with significantly smaller memory footprint and computation resource requirements. We achieve this by judiciously dropping connections in large blocks of weights. The corresponding technique, termed coarse-grain sparsification (CGS), introduces hardware-aware sparsity during the DNN training, which leads to efficient weight memory compression and significant computation reduction during classification without losing accuracy. We apply the proposed approach to DNN design for keyword detection and speech recognition. When the two DNNs are trained with 75% of the weights dropped and classified with 5–6 bit weight precision, the weight memory requirement is reduced by 95% compared to their fully-connected counterparts with double precision, while maintaining similar performance in keyword detection accuracy, word error rate, and sentence error rate. To validate this technique in real hardware, a time-multiplexed architecture using a shared multiply and accumulate (MAC) engine was implemented in 65nm and 40nm low power (LP) CMOS. In 40nm at 0.6 V, the keyword detection network consumes 36µW and the speech recognition network consumes 552µW, making this technique highly suitable for mobile and wearable devices.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Processing-in-memory (PIM)-based Manycore Architecture for Training Graph Neural Networks
Partha P Pande
-
Partha P PandePartha P Pande
17 Apr 2023
17 Apr 2023

Low-rank and sparse soft targets to learn better DNN acoustic models
Pranay Dighe ... Afsaneh Asaei
-
Pranay Dighe, et. al.Pranay Dighe ... Afsaneh Asaei
18 Oct 2016
18 Oct 2016

Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges
Edgar Galvan ... Peter Mooney
IEEE Transactions on Artificial Intelligence | VOL. 2
Edgar Galvan, et. al.Edgar Galvan ... Peter Mooney
04 May 2021
IEEE Transactions on Artificial Intelligence | VOL. 2

Efficient Deep Feature Learning and Extraction via StochasticNets
Mohammad Javad Shafiee ... Parthipan Siva
-
Mohammad Javad Shafiee, et. al.Mohammad Javad Shafiee ... Parthipan Siva
01 Jun 2016
01 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications

Abstract

Talk to us

Similar Papers