NZESPA: A Near-3D-Memory Zero Skipping Parallel Accelerator for CNNs

Palash Das,Hemangee K Kapoor

doi:10.1109/tcad.2020.3022330

Abstract

Convolutional neural networks (CNNs) are one of the most popular machine learning tools for computer vision. The ubiquitous use in several applications with its high computation-cost has made it lucrative for optimization through accelerated architecture. State-of-the-art has either exploited the parallelism of CNNs, or eliminated computations through sparsity or used near-memory processing (NMP) to accelerate the CNNs. We introduce NMP-fully sparse architecture, which acquires all three capabilities. The proposed architecture is parallel and hence processes the independent CNN tasks concurrently. To exploit the sparsity, the proposed system employs a dataflow, namely, Near-3D-Memory Zero Skipping Parallel dataflow or nZESPA dataflow. This dataflow maintains the compressed-sparse encoding of data that skips all ineffectual zero-valued computations of CNNs. We design a custom accelerator which employs the nZESPA dataflow. The grids of nZESPA modules are integrated into the logic layer of the hybrid memory cube. This integration saves a significant amount of off-chip communications while implementing the concept of NMP. We compare the proposed architecture with three other architectures which either do not exploit sparsity (NMP-dense) or do not employ NMP (traditional-fully sparse) or do not include both (traditional-dense). The proposed system outperforms the baselines in terms of performance and energy consumption while executing CNN inference.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

NZESPA: A Near-3D-Memory Zero Skipping Parallel Accelerator for CNNs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Sep 8, 2020
Citations: 4

Similar Papers

CNNParted: An open source framework for efficient Convolutional Neural Network inference partitioning in embedded systems
Fabian Kreß ... Jürgen Becker
Computer Networks | VOL. 229
Fabian Kreß, et. al.Fabian Kreß ... Jürgen Becker
08 Apr 2023
Computer Networks | VOL. 229

Arithmetic Coding-Based 5-Bit Weight Encoding and Hardware Decoder for CNN Inference in Edge Devices
Jong Hun Lee ... Arslan Munir
IEEE Access | VOL. 9
Jong Hun Lee, et. al.Jong Hun Lee ... Arslan Munir
01 Jan 2020
IEEE Access | VOL. 9

Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks
Min-Jae Kim ... Jeong-Geun Kim
IEEE Access | VOL. 9
Min-Jae Kim, et. al.Min-Jae Kim ... Jeong-Geun Kim
01 Jan 2020
IEEE Access | VOL. 9

Towards cross-modal pre-training and learning tempo-spatial characteristics for audio recognition with convolutional and recurrent neural networks
Shahin Amiriparian ... Björn Schuller
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2020
Shahin Amiriparian, et. al.Shahin Amiriparian ... Björn Schuller
01 Dec 2020
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NZESPA: A Near-3D-Memory Zero Skipping Parallel Accelerator for CNNs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems