Leveraging Bit-Serial Architectures for Hardware-Oriented Deep Learning Accelerators with Column-Buffering Dataflow

Xiaoshu Cheng,Ping Li,Yiwen Wang,Weiran Ding,Hongfei Lou

doi:10.3390/electronics13071217

Abstract

Bit-serial neural network accelerators address the growing need for compact and energy-efficient deep learning tools. Traditional neural network accelerators, while effective, often grapple with issues of size, power consumption, and versatility in handling a variety of computational tasks. To counter these challenges, this paper introduces an approach that hinges on the integration of bit-serial processing with advanced dataflow techniques and architectural optimizations. Central to this approach is a column-buffering (CB) dataflow, which significantly reduces access and movement requirements for the input feature map (IFM), thereby enhancing efficiency. Moreover, a simplified quantization process effectively eliminates biases, streamlining the overall computation process. Furthermore, this paper presents a meticulously designed LeNet-5 accelerator leveraging a convolutional layer processing element array (CL PEA) architecture incorporating an improved bit-serial multiply–accumulate unit (MAC). Empirically, our work demonstrates superior performance in terms of frequency, chip area, and power consumption compared to current state-of-the-art ASIC designs. Specifically, our design utilizes fewer hardware resources to implement a complete accelerator, achieving a high performance of 7.87 GOPS on a Xilinx Kintex-7 FPGA with a brief processing time of 284.13 μs. The results affirm that our design is exceptionally suited for applications requiring compact, low-power, and real-time solutions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Leveraging Bit-Serial Architectures for Hardware-Oriented Deep Learning Accelerators with Column-Buffering Dataflow

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Journal: Electronics	Publication Date: Mar 26, 2024
License type: CC BY 4.0

Similar Papers

Stick Buffer Cache v2: Improved Input Feature Map Cache for Reducing off-chip Memory Traffic in CNN Accelerators
Rastislav Struharik ... Vuk Vranjkovic
-
Rastislav Struharik, et. al.Rastislav Struharik ... Vuk Vranjkovic
01 Nov 2019
01 Nov 2019

Striping input feature map cache for reducing off-chip memory traffic in CNN accelerators
Rastislav Struharik ... Vuk Vranjković
Telfor Journal | VOL. 12
Rastislav Struharik, et. al.Rastislav Struharik ... Vuk Vranjković
01 Jan 2020
Telfor Journal | VOL. 12

WMDRS: Workload-Aware Performance Model Based Multi-Task Dynamic-Quota Real-Time Scheduling for Neural Processing Units
Chong Liu ... Yi Dang
-
Chong Liu, et. al.Chong Liu ... Yi Dang
01 Jan 2023
01 Jan 2023

Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems
Jisu Kwon ... Joonho Kong
IET Computers & Digital Techniques | VOL. 16
Jisu Kwon, et. al.Jisu Kwon ... Joonho Kong
29 Nov 2021
IET Computers & Digital Techniques | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Leveraging Bit-Serial Architectures for Hardware-Oriented Deep Learning Accelerators with Column-Buffering Dataflow

Abstract

Talk to us

Similar Papers

More From: Electronics