Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator

Miao Yu,Trevor E Carlson,Venkata Pavan Kumar Miriyala,Tingting Xiang

doi:10.1145/3630255

Abstract

Deep neural network inference has become a vital workload for many systems from edge-based computing to data centers. To reduce the performance and power requirements for deep neural networks (DNNs) running on these systems, pruning is commonly used as a way to maintain most of the accuracy of the system while significantly reducing the workload requirements. Unfortunately, accelerators designed for unstructured pruning typically employ expensive methods to either determine non-zero activation-weight pairings or reorder computation. These methods require additional storage and memory accesses compared to the more regular data access patterns seen in structurally pruned models. However, even existing works that focus on the more regular access patterns seen in structured pruning continue to suffer from inefficient designs, which either ignore or expensively handle activation sparsity leading to low performance. To address these inefficiencies, we leverage structured pruning and propose the multiply-and-fire (MnF) technique, which aims to solve these problems in three ways: (a) the use of a novel event-driven dataflow that naturally exploits activation sparsity without complex, high-overhead logic; (b) an optimized dataflow takes an activation-centric approach, which aims to maximize the reuse of activation data in computation and ensures the data are only fetched once from off-chip global and on-chip local memory; and (c) based on the proposed event-driven dataflow, we develop an energy-efficient, high-performance sparsity-aware DNN accelerator. Our results show that our MnF accelerator achieves a significant improvement across a number of modern benchmarks and presents a new direction to enable highly efficient AI inference for both CNN and MLP workloads. Overall, this work achieves a geometric mean of 11.2× higher energy efficiency and 1.41× speedup compared to a state-of-the-art sparsity-aware accelerator.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization

Lead the way for us

Similar Papers

VLAG: A very fast locality approximation model for GPU kernels with regular access patterns
Mohsen Kiani ... Amir Rajabzadeh
-
Mohsen Kiani, et. al.Mohsen Kiani ... Amir Rajabzadeh
01 Oct 2017
01 Oct 2017

MEMORY HIERARCHY PERFORMANCE PREDICTION FOR BLOCKED SPARSE ALGORITHMS
Basilio B Fraguela ... Ramón Doallo
Parallel Processing Letters | VOL. 09
Basilio B Fraguela, et. al.Basilio B Fraguela ... Ramón Doallo
01 Sep 1999
Parallel Processing Letters | VOL. 09

Improving Memory Access Performance of In-Memory Key-Value Store Using Data Prefetching Techniques
Pengfei Zhu ... Guangyu Sun
-
Pengfei Zhu, et. al.Pengfei Zhu ... Guangyu Sun
01 Jan 2015
01 Jan 2015

Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management
Bingchao Li ... Nam Sung Kim
-
Bingchao Li, et. al.Bingchao Li ... Nam Sung Kim
01 May 2017
01 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization