Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks

Jose Nunez-Yanez,Mohammad Hosseinabady

doi:10.1016/j.array.2021.100101

Abstract

In this paper, we present hardware accelerators created with high-level synthesis techniques for sparse and dense matrix multiplication operations. The cores can operate with different precisions and are designed to be integrated in a heterogeneous CPU-FPGA system for Edge AI applications. The methodology involves quantization-sparsity aware training and it is applied to a case study consisting of human activity classification. We initially investigate the effects of quantization and sparsity on the accuracy of neural networks with convolution, dense and recurrent layers observing better tolerance to pruning when recurrent layers are present. Then, we propose the hardware accelerators that can switch precision at run-time and work with any matrix size up to a maximum configured at compile time. We compare the performance of these accelerators at different levels of precision and sparsity levels and create a performance model to enable workload balancing. The results show that the proposed sparse matrix multipliers can outperform dense multipliers when sparsity levels are higher than 70% and the improvements are more evident when higher precision arithmetic or structural pruning is used. Additionally, sparsity levels as high as 99% can maintain the level of accuracy required in the network especially when recurrent layers are deployed. Overall, the balance between sparse and dense performance depends on matrix shape, precision, structural pruning and sparsity levels and performance modelling can be used to balance concurrent execution in a heterogeneous configuration.

Highlights

Over the last few years, novel hardware for deep-learning in AI from well-known companies and start-ups have entered the market, focusing on high energy-efficiency/performance and low cost
Real-time inference of deep neural networks (DNNs) on custom hardware has become increasingly relevant with low-precision arithmetic and training frameworks such as the 8 bit EdgeTPU Google devices and TensorFlow Lite [3]
We investigate the effects of deep quantization and pruning on accuracy with convolutional and recurrent layers targeting a motion detection application

Summary

Introduction

Over the last few years, novel hardware for deep-learning in AI from well-known companies and start-ups have entered the market, focusing on high energy-efficiency/performance and low cost. Matrix multiplication acceleration based on combined sparse and dense arithmetic with multi-precision arithmetic as proposed in this research could select the optimal hardware configuration depending on the task. Motivated by these observations, this paper makes the following contributions:. It reviews mixed and arbitrary precision which is more suitable for reconfigurable hardware and concludes that sparse operators are an area open to new research.

Background and related work

Sub-byte precision hardware

Methodology and case study

Pruning and quantization accuracy analysis

GEMM hardware

SPMM hardware

Performance and complexity analysis

Structural pruning optimization

Performance modelling

GEMM model

Conclusions and future work

Findings

SPMM model

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Array	Publication Date: Dec 1, 2021
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Array

Lead the way for us

Similar Papers

Sparse GPU Kernels for Deep Learning
Trevor Gale ... Matei Zaharia
-
Trevor Gale, et. al.Trevor Gale ... Matei Zaharia
01 Nov 2020
01 Nov 2020

VerSA: Versatile Systolic Array Architecture for Sparse and Dense Matrix Multiplications
Juwon Seo ... Joonho Kong
Electronics | VOL. 13
Juwon Seo, et. al.Juwon Seo ... Joonho Kong
15 Apr 2024
Electronics | VOL. 13

Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity
Min Li ... Chao Yang
IEEE Transactions on Parallel and Distributed Systems | VOL. 32
Min Li, et. al.Min Li ... Chao Yang
01 Jan 2020
IEEE Transactions on Parallel and Distributed Systems | VOL. 32

Semi-External Memory Sparse Matrix Multiplication for Billion-Node Graphs
Da Zheng ... Carey E Priebe
IEEE Transactions on Parallel and Distributed Systems | VOL. 28
Da Zheng, et. al.Da Zheng ... Carey E Priebe
14 Oct 2016
IEEE Transactions on Parallel and Distributed Systems | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Array