Abstract

Compute-in-Memory (CIM) implemented with Resistive-Random-Access-Memory (RRAM) crossbars is a promising approach for Deep Neural Network (DNN) acceleration. As the DNN size continues to grow, the finite on-chip weight storage has become a challenge for CIM implementations. Pruning can reduce network size, but unstructured pruning is not compatible with CIM, while structured pruning leads to higher neural network accuracy drop. In this work we systematically evaluate how structured pruning can be efficiently implemented in CIM systems. We show that by utilizing the inherent computational granularity in CIM operations, fine-grained structured pruning can be supported with improved accuracy and minimal hardware cost. We discuss the hardware implementation in a practical system and the expected performance in terms of accuracy, energy and effective throughput. With the proposed approach, compression ratio up to 11.1 (i.e. 9% weights remaining) can be achieved with only 0.6% accuracy drop with minimal hardware overhead in the hardware design.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.