A Systolic Accelerator for Neuromorphic Visual Recognition

Weixia Xu,Shuo Tian,Zhijie Yang,Shi Xu,Lei Wang,Shasha Guo,Jianfeng Zhang

doi:10.3390/electronics9101690

Weixia Xu, Shuo Tian + Show 5 more

Open Access

https://doi.org/10.3390/electronics9101690

Copy DOI

Journal: Electronics	Publication Date: Oct 15, 2020
Citations: 1	License type: CC BY 4.0

Affiliation: National University of Defense Technology

Abstract

Advances in neuroscience have encouraged researchers to focus on developing computational models that behave like the human brain. HMAX is one of the potential biologically inspired models that mimic the primate visual cortex’s functions and structures. HMAX has shown its effectiveness and versatility in multi-class object recognition with a simple computational structure. It is still a challenge to implement the HMAX model in embedded systems due to the heaviest computational S2 phase of HMAX. Previous implementations such as CoRe16 have used a reconfigurable two-dimensional processing element (PE) array to speed up the S2 layer for HMAX. However, the adder tree mechanism in CoRe16 used to produce output pixels by accumulating partial sums in different PEs increases the runtime for HMAX. To speed up the execution process of the S2 layer in HMAX, in this paper, we propose SAFA (systolic accelerator for HMAX), a systolic-array based architecture to compute and accelerate the S2 stage of HMAX. Using the output stationary (OS) dataflow, each PE in SAFA not only calculates the output pixel independently without additional accumulation of partial sums in multiple PEs, but also reduces the multiplexers applied in reconfigurable accelerators. Besides, data forwarding for the same input or weight data in OS reduces the memory bandwidth requirements. The simulation results show that the runtime of the heaviest computational S2 stage in HMAX model is decreased by 5.7%, and the bandwidth required for memory is reduced by 3.53 × on average by different kernel sizes (except for kernel = 12) compared with CoRe16. SAFA also obtains lower power and area costs than other reconfigurable accelerators from synthesis on ASIC.

Highlights

The human brain is the most power-efficient processor
Matlab ran under Mac output stationary (OS) Mojave with a CPU of 2.6 GHz Intel Core i5 and 8 GB of DDR3
We synthesized the prototype of SAFA

Summary

Introduction

The human brain is the most power-efficient processor. The last three decades have seen great success in understanding the ventral and dorsal pathways for the human visual cortex. Sabarad et al [11] composed processing element (PE) arrays to form a reconfigurable two-dimensional convolution accelerator CoRe16 on FPGA to accelerate the heaviest computational S2 layer of HMAX model. To further speed up the execution process of S2 layer in HMAX, in this paper, we propose SAFA (systolic accelerator for HMAX), a systolic-array based accelerator for the S2 stage in HMAX model. 2. We utilize the OS dataflow to compute each output pixel in every PE independently, which speeds up the execution process of the S2 layer in HMAX. 3. We compared the runtime of SAFA with different shapes and found the best form of the systolic array to accelerate the S2 layer in HMAX.

Background and Preliminary Information

Spatial Pooling Grid

SAFA: Systolic Accelerator for HMAX

Structure of the Systolic Array

Processing Element

Schematic Dataflow for SAFA

Experiment Setup

Implementation Details

Comparison of Runtime

Comparison of Storage Bandwidth

Comparison of Unit Utilization

Sensitivity of Shape Array on Runtime

Area and Power Evaluation

Conclusions and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Systolic Accelerator for Neuromorphic Visual Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Multi-functional systolic array with reconfigurable micro-power processing elements
E.I Milovanović ... I.Ž Milovanović
Microelectronics Reliability | VOL. 49
E.I Milovanović, et. al.E.I Milovanović ... I.Ž Milovanović
21 Apr 2009
Microelectronics Reliability | VOL. 49

Iterative Search for Reconfigurable Accelerator Blocks With a Compiler in the Loop
Max Willsey ... Vincent T Lee
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 38
Max Willsey, et. al.Max Willsey ... Vincent T Lee
01 Mar 2019
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 38

A 2.72GOPS/11mW low power reconfigurable accelerator with a highly parallel datapath consisting of combinatorial circuits in 65nm CMOS
N Ozaki ... M Kondo
-
N Ozaki, et. al.N Ozaki ... M Kondo
01 Dec 2011
01 Dec 2011

Reconfigurable 2, 3 and 5‐point DFT processing element for SDF FFT architecture using fast cyclic convolution algorithm
Bibin Sam Paul S ... A.X Glittas
Electronics Letters | VOL. 56
Bibin Sam Paul S, et. al.Bibin Sam Paul S ... A.X Glittas
01 Jun 2020
Electronics Letters | VOL. 56

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Systolic Accelerator for Neuromorphic Visual Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics