O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference

Tong Geng,Runbin Shi,Ang Li,Chunshu Wu,Yanfei Li,Martin Herbordt,Wei Wu,Tianqi Wang

doi:10.1109/tpds.2020.3013637

Abstract

Binarized Neural Networks (BNN), which significantly reduce computational complexity and memory demand, have shown potential in cost- and power-restricted domains, such as IoT and smart edge-devices, where reaching certain accuracy bars is sufficient and real-time is highly desired. In this article, we demonstrate that the highly-condensed BNN model can be shrunk significantly by dynamically pruning irregular redundant edges. Based on two new observations on BNN-specific properties, an out-of-order (OoO) architecture, O3BNN-R, which can curtail edge evaluation in cases where the binary output of a neuron can be determined early at runtime during inference, is proposed. Similar to instruction level parallelism (ILP), fine-grained, irregular, and runtime pruning opportunities are traditionally presumed to be difficult to exploit. To further enhance the pruning opportunities, we conduct an algorithm/architecture co-design approach where we augment the loss function during the training stage with specialized regularization terms favoring edge pruning. We evaluate our design on an embedded FPGA using networks that include VGG-16, AlexNet for ImageNet, and a VGG-like network for Cifar-10. Results show that O3BNN-R without regularization can prune, on average, 30 percent of the operations, without any accuracy loss, bringing 2.2× inference-speedup, and on average 34× energy-efficiency improvement over state-of-the-art BNN implementations on FPGA/GPU/CPU. With regularization at training, the performance is further improved, on average, by 15 percent.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Jan 1, 2021
Citations: 65	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Similar Papers

O3BNN
Tong Geng ... Tianqi Wang
-
Tong Geng, et. al.Tong Geng ... Tianqi Wang
26 Jun 2019
26 Jun 2019

HW/SW Codesign for Approximation-Aware Binary Neural Networks
Abhilasha Dave ... Hussam Amrouch
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | VOL. 13
Abhilasha Dave, et. al.Abhilasha Dave ... Hussam Amrouch
01 Mar 2023
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | VOL. 13

LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism
Tong Geng ... Chen Yang
-
Tong Geng, et. al.Tong Geng ... Chen Yang
01 Jul 2019
01 Jul 2019

Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC
Eriko Nurvitadhi ... Asit Mishra
-
Eriko Nurvitadhi, et. al.Eriko Nurvitadhi ... Asit Mishra
01 Dec 2016
01 Dec 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems