Abstract

Advances in neuroscience have encouraged researchers to focus on developing computational models that behave like the human brain. HMAX is one of the potential biologically inspired models that mimic the primate visual cortex’s functions and structures. HMAX has shown its effectiveness and versatility in multi-class object recognition with a simple computational structure. It is still a challenge to implement the HMAX model in embedded systems due to the heaviest computational S2 phase of HMAX. Previous implementations such as CoRe16 have used a reconfigurable two-dimensional processing element (PE) array to speed up the S2 layer for HMAX. However, the adder tree mechanism in CoRe16 used to produce output pixels by accumulating partial sums in different PEs increases the runtime for HMAX. To speed up the execution process of the S2 layer in HMAX, in this paper, we propose SAFA (systolic accelerator for HMAX), a systolic-array based architecture to compute and accelerate the S2 stage of HMAX. Using the output stationary (OS) dataflow, each PE in SAFA not only calculates the output pixel independently without additional accumulation of partial sums in multiple PEs, but also reduces the multiplexers applied in reconfigurable accelerators. Besides, data forwarding for the same input or weight data in OS reduces the memory bandwidth requirements. The simulation results show that the runtime of the heaviest computational S2 stage in HMAX model is decreased by 5.7%, and the bandwidth required for memory is reduced by 3.53 × on average by different kernel sizes (except for kernel = 12) compared with CoRe16. SAFA also obtains lower power and area costs than other reconfigurable accelerators from synthesis on ASIC.

Highlights

  • The human brain is the most power-efficient processor

  • Matlab ran under Mac output stationary (OS) Mojave with a CPU of 2.6 GHz Intel Core i5 and 8 GB of DDR3

  • We synthesized the prototype of SAFA

Read more

Summary

Introduction

The human brain is the most power-efficient processor. The last three decades have seen great success in understanding the ventral and dorsal pathways for the human visual cortex. Sabarad et al [11] composed processing element (PE) arrays to form a reconfigurable two-dimensional convolution accelerator CoRe16 on FPGA to accelerate the heaviest computational S2 layer of HMAX model. To further speed up the execution process of S2 layer in HMAX, in this paper, we propose SAFA (systolic accelerator for HMAX), a systolic-array based accelerator for the S2 stage in HMAX model. 2. We utilize the OS dataflow to compute each output pixel in every PE independently, which speeds up the execution process of the S2 layer in HMAX. 3. We compared the runtime of SAFA with different shapes and found the best form of the systolic array to accelerate the S2 layer in HMAX.

Background and Preliminary Information
Spatial Pooling Grid
SAFA: Systolic Accelerator for HMAX
Structure of the Systolic Array
Processing Element
Schematic Dataflow for SAFA
Experiment Setup
Implementation Details
Comparison of Runtime
Comparison of Storage Bandwidth
Comparison of Unit Utilization
Sensitivity of Shape Array on Runtime
Area and Power Evaluation
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.