LUT‐DSP usage trade‐off for re‐configurable convolution acceleration core based on small logarithmic floating point representation

Botao Xiong,Sicun Li,Xintong He,Rensheng Shen,Zezhao Zhou,Runhua Yang,Yuchun Chang,Sheng Fan

doi:10.1002/cta.3834

Abstract

AbstractThe challenge in designing the high‐performance field‐programmable gate array (FPGA)‐based convolution accelerator is to take full advantage of the on‐chip computing resources. The reported CNN accelerators always exhaust the on‐chip DSPs and leave other computing resources under‐utilized. Hence, this brief presents a novel convolution acceleration core based on the small logarithmic floating‐point (SLFP) format, which results in three contributions. (1) The SLFP<3,5> multiplier is only implemented with LUT6s and operates at 650 MHz with the aid of the carry chain, which provides sufficient accuracy for most CNNs. In addition, a similar structure can be used to implement a SLFP<3,5> divider. (2) The DSPs in the TWO24 SIMD mode are cascaded to implement a 9‐input adder tree. The sum of the multiples of elements (e.g., , ) is easily obtained by configuring the last DSP in the 9‐input adder tree in the accumulation mode, which can support more kernels (e.g., , ) with a high utilization rate ( ). (3) The convolution core based on the SLFP format only uses LUT6s and DSPs to achieve 1300 MOPS, 433 MOPS, and 81 MOPS for , , and kernel, respectively. In summary, the proposed convolution accelerator not only balances the resource usage of LUT6s and DSPs but also quantizes most CNN models using several simple scaling operations instead of a computing‐intensive retraining algorithm because the distribution of SLFP numbers is very similar to FP32 numbers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LUT‐DSP usage trade‐off for re‐configurable convolution acceleration core based on small logarithmic floating point representation

Abstract

Talk to us

Similar Papers

More From: International Journal of Circuit Theory and Applications

Lead the way for us

Similar Papers

A reconfigurable real‐time neuromorphic hardware for spiking winner‐take‐all network
Behrooz Abdoli ... Saeed Safari
International Journal of Circuit Theory and Applications | VOL. 48
Behrooz Abdoli, et. al.Behrooz Abdoli ... Saeed Safari
01 Oct 2020
International Journal of Circuit Theory and Applications | VOL. 48

Enhanced Efficiency 3D Convolution Based on Optimal FPGA Accelerator
Hai Wang ... Mengjun Shao
IEEE Access | VOL. 5
Hai Wang, et. al.Hai Wang ... Mengjun Shao
01 Jan 2017
IEEE Access | VOL. 5

Design automation tools for FPGA design (panel)
Kella Knack ... Steve Trimberger
-
Kella Knack, et. al.Kella Knack ... Steve Trimberger
01 Jan 1993
01 Jan 1993

Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge
Guilherme Korol ... Mateus Beck Rutzig
ACM Transactions on Embedded Computing Systems | VOL. 20
Guilherme Korol, et. al.Guilherme Korol ... Mateus Beck Rutzig
17 Sep 2021
ACM Transactions on Embedded Computing Systems | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LUT‐DSP usage trade‐off for re‐configurable convolution acceleration core based on small logarithmic floating point representation

Abstract

Talk to us

Similar Papers

More From: International Journal of Circuit Theory and Applications