FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling

Masayuki Shimoda,Youki Sada,Hiroki Nakahara

doi:10.1007/s11265-021-01642-6

Masayuki Shimoda, Youki Sada + Show 1 more

Open Access

https://doi.org/10.1007/s11265-021-01642-6

Copy DOI

Abstract

Convolutional neural networks (CNNs) exhibit state-of-the-art performance while performing computer-vision tasks. CNNs require high-speed, low-power, and high-accuracy hardware for various scenarios, such as edge environments. However, the number of weights is so large that embedded systems cannot store them owing to their limited on-chip memory. A different method is used to minimize the input image size, for real-time processing, but it causes a considerable drop in accuracy. Although pruned sparse CNNs and special accelerators are proposed, the requirement of random access incurs a large number of wide multiplexers for a high degree of parallelism, which becomes more complicated and unsuitable for FPGA implementation. To address this problem, we propose filter-wise pruning with distillation and block RAM (BRAM)-based zero-weight skipping accelerator. It eliminates weights such that each filter has the same number of nonzero weights, performing retraining with distillation, while retaining comparable accuracy. Further, filter-wise pruning enables our accelerator to exploit inter-filter parallelism, where a processing block for a layer executes filters concurrently, with a straightforward architecture. We also propose an overlapped tiling algorithm, where tiles are extracted with overlap to prevent both accuracy degradation and high utilization of BRAMs storing high-resolution images. Our evaluation using semantic-segmentation tasks showed a 1.8 times speedup and 18.0 times increase in power efficiency of our FPGA design compared with a desktop GPU. Additionally, compared with the conventional FPGA implementation, the speedup and accuracy improvement were 1.09 times and 6.6 points, respectively. Therefore, our approach is useful for FPGA implementation and exhibits considerable accuracy for applications in embedded systems.

Highlights

IntroductionPruning [13] is a compression technique that eliminates unnecessary weights below a threshold, where pruning converts dense weight matrices to unstructured sparse matrices
Convolutional neural networks (CNNs) [27] deliver stateof-the-art performance in computer-vision tasks such as object classification [25], object detection [30], and semantic segmentation [41]
We propose an overlapped tiling algorithm to reduce the utilization of on-chip memory on FPGAs for high-resolution images (Section 6)

Summary

Introduction

Pruning [13] is a compression technique that eliminates unnecessary weights below a threshold, where pruning converts dense weight matrices to unstructured sparse matrices This approach can lead to more than a 10-fold reduction in the number of parameters with comparable accuracy [13]. A new algorithm/hardware co-design approach is proposed in this study It involves filter-wise pruning with distillation, and its special inter-layer pipelined accelerator for FPGA implementation. We apply our filter-wise pruning with distillation to a lightweight network model, MobileNetV1based model, and compare it with the state-of-the-art FPGA implementation. The previous FPGA-based accelerator in ARC 2019 is expanded to use inter-filter parallelism (Section 5.1).

Unstructured Nonzero Weight Matrices

Convolutional Neural Networks

Separable CONV

Sparse CONV

Batch Normalization Folding

Semantic Segmentation

Filter-Wise Pruning with Distillation

Distillation Scheme for Retraining Weights

Hardware Implementation

Convolutional Block

Overlapped Tiling Algorithm

Experimental Results

MobileNetV1-Based PSPNet

Accuracy Comparison for Sparseness Ratio and Quantization

Comparison with a Desktop GPU

Comparison with Other FPGA Implementation

Comparison with Other Pruning Method

Sparseness Approach for Weight Memory Reduction

FPGA Implementation for CNN-Based Semantic Segmentation

Sparse Convolutional Network Architecture

Zero-Weight Skipping Architecture

Zero-Weight and -Activation Skipping Architecture

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Signal Processing Systems	Publication Date: Feb 13, 2021
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Signal Processing Systems

Lead the way for us

Similar Papers

The use of convolutional neural networks for modelling large optically-selected strong galaxy-lens samples
Nan Li ... Simon Dye
Monthly Notices of the Royal Astronomical Society | VOL. 488
Nan Li, et. al.Nan Li ... Simon Dye
26 Jun 2019
Monthly Notices of the Royal Astronomical Society | VOL. 488

Clinically Relevant Vulnerabilities of Deep Machine Learning Systems for Skin Cancer Diagnosis
Manpreet Lakhan ... Fiona M Watt
Journal of Investigative Dermatology | VOL. 141
Manpreet Lakhan, et. al.Manpreet Lakhan ... Fiona M Watt
12 Sep 2020
Journal of Investigative Dermatology | VOL. 141

Deep Convolutional Neural Networks Detect Tumor Genotype from Pathological Tissue Images in Gastrointestinal Stromal Tumors.
Hsuan-Ying Huang ... Pei-Wei Fang
Cancers | VOL. 13
Hsuan-Ying Huang, et. al.Hsuan-Ying Huang ... Pei-Wei Fang
18 Nov 2021
Cancers | VOL. 13

Cell division
Kiyoung Choi ... Hanmin Park
-
Kiyoung Choi, et. al.Kiyoung Choi ... Hanmin Park
21 Jan 2019
21 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Signal Processing Systems