Hardware-Efficient Residual Neural Network Execution in Line-Buffer Depth-First Processing

Man Shi,Pouya Houshmand,Marian Verhelst,Linyan Mei

doi:10.1109/jetcas.2021.3120103

Man Shi, Pouya Houshmand + Show 2 more

Open Access

https://doi.org/10.1109/jetcas.2021.3120103

Copy DOI

Abstract

Deep convolutional neural networks (CNNs) provide State-of-the-Art (SotA) results in a variety of machine learning (ML) applications, such as image classification and object detection. The trends towards higher-resolution input images and large-scale model structures have brought drastic performance improvements for these tasks. Yet, these trends also pose increased stress on hardware resources. Recent work proposed an advanced layer fusion method relying on the line-buffer depth-first (LBDF) processing, to reduce memory usage and off-chip memory accesses in high resolution CNN processing. However, when deploying such LBDF approach for residual networks, the required external memory access or on-chip memory capacity drastically increases again, hence canceling many benefits of the depth-first approach. This paper, therefore, outlines and analyzes the possible methods for handling residual connections (RC) in combination with LBDF. With the proposed ReLU-based compression and tiling techniques, we minimize the overhead of RC on on-chip memory requirements and external memory accesses. Moreover, a proposed hybrid strategy, optimally combining the different methods across different residual blocks of a residual network, yields for ResNet20 and SqueezeNet a reduction of 24.7% and 45.1% respectively in inference energy cost. Also the execution latency is reduced by 19.4% and 46.7% respectively in comparison to the best single method under certain on-chip memory budgets. In addition, compared to the SotAs, in the best case, our hybrid strategy only consumes 43% energy and 29.9% latency while use the same on-chip memories; only requires 27.4% on-chip memory to achieve the same performance.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE journal on emerging and selected topics in circuits and systems	Publication Date: Dec 1, 2021
Citations: 5	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Hardware-Efficient Residual Neural Network Execution in Line-Buffer Depth-First Processing

Abstract

Talk to us

Similar Papers

More From: IEEE journal on emerging and selected topics in circuits and systems

Lead the way for us

Similar Papers

Automatic Memory-Efficient Scheduling of CNNs
Luc Waeijen ... Maurice Peemen
-
Luc Waeijen, et. al.Luc Waeijen ... Maurice Peemen
01 Jan 2019
01 Jan 2019

A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection
Duy Thanh Nguyen ... Tuan Nghia Nguyen
IEEE Transactions on Very Large Scale Integration Systems | VOL. 27
Duy Thanh Nguyen, et. al.Duy Thanh Nguyen ... Tuan Nghia Nguyen
01 Aug 2019
IEEE Transactions on Very Large Scale Integration Systems | VOL. 27

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
Naveen Suda ... Abinash Mohanty
-
Naveen Suda, et. al.Naveen Suda ... Abinash Mohanty
21 Feb 2016
21 Feb 2016

Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors
Duy Thanh Nguyen ... Hyuk-Jae Lee
IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society | VOL. 31
Duy Thanh Nguyen, et. al.Duy Thanh Nguyen ... Hyuk-Jae Lee
03 Sep 2020
IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hardware-Efficient Residual Neural Network Execution in Line-Buffer Depth-First Processing

Abstract

Talk to us

Similar Papers

More From: IEEE journal on emerging and selected topics in circuits and systems