NEURA ghe

Paolo Meloni,Luca Benini,Luigi Raffo,Gianfranco Deriu,Davide Rossi,Francesco Conti,Alessandro Capotondi,Michele Brian

doi:10.1145/3284357

Abstract

Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURA ghe , a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURA ghe leverages the synergistic usage of Zynq ARM cores and of a powerful and flexible Convolution-Specific Processor deployed on the reconfigurable logic. The Convolution-Specific Processor embeds both a convolution engine and a programmable soft core, releasing the ARM processors from most of the supervision duties and allowing the accelerator to be controlled by software at an ultra-fine granularity. This methodology opens the way for cooperative heterogeneous computing: While the accelerator takes care of the bulk of the CNN workload, the ARM cores can seamlessly execute hard-to-accelerate parts of the computational graph, taking advantage of the NEON vector engines to further speed up computation. Through the companion NeuDNN SW stack, NEURA ghe supports end-to-end CNN-based classification with a peak performance of 169GOps/s, and an energy efficiency of 17GOps/W. Thanks to our heterogeneous computing model, our platform improves upon the state-of-the-art, achieving a frame rate of 5.5 frames per second (fps) on the end-to-end execution of VGG-16 and 6.6fps on ResNet-18.

Highlights

In the last few years, Deep Convolutional Neural Networks have become the go-to solution for most tasks that require human-level understanding of data
As an integration to the second use-case, we present an experiment related with the acceleration of a light-weight CNN topology, to provide an insight on the possibility of accelerating with NEURAghe recent algorithms conceived for extensive workload reduction
The accelerator implemented in the programmable logic is controllable via software, integrating a microcontroller in charge of finely managing the basic operations of the other building blocks

Summary

INTRODUCTION

In the last few years, Deep Convolutional Neural Networks have become the go-to solution for most tasks that require human-level understanding of data. Several dedicated accelerators have been proposed in the embedded domain both from companies such as Movidius [26] and from the research community [4, 5, 9] These architectures are typically implemented as a systolic array of processing elements or more specialized engines focused on the acceleration of convolution-accumulation loops, outperforming all programmable solutions (including FPGAs) in both performance and energy efficiency thanks to the highly optimized implementation approach. It allows to implement any kind of CNN models fully exploiting the hardware and software capabilities of the Z-7045 SoC; on the other hand, it eases the porting with big performance benefits to next-generation Ultrascale+ SoC These SoCs feature a bigger and faster FPGA on the programmable logic (PL), which would allow to host two convolutional engines running at 200 MHz, and they feature a more powerful processing system (PS) based on a quad-core ARM Cortex A53 processor.

RELATED WORK

Target computational model

System architecture

Convolution-Specific Processor

Convolution Engine

Line buffers

SoP modules

Pooling and ReLU module

NEUDNN

NeuDNN front-end

NeuDNN Back-End

EXPERIMENTAL RESULTS

Hardware implementation evaluation

VGG-16

ResNet-18

GPP-accelerated layers performance analysis

Comparison with State of The Art

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM Transactions on Reconfigurable Technology and Systems	Publication Date: Sep 30, 2018
Citations: 56	License type: cc-by

R Discovery Prime

R Discovery Prime

NEURA ghe

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ACM Transactions on Reconfigurable Technology and Systems

Lead the way for us

Similar Papers

Convolutional Neural Network Model in Machine Learning Methods and Computer Vision for Image Recognition: A Review
...
Journal of Applied Sciences Research | VOL. -
, et. al. ...
01 Jan 2018
Journal of Applied Sciences Research | VOL. -

Convolutional and Recurrent Neural Networks
Umberto Michelucci
-
Umberto MichelucciUmberto Michelucci
01 Jan 2018
01 Jan 2018

Study on CNN in the recognition of emotion in audio and images
Bin Zhang ... Fuji Ren
-
Bin Zhang, et. al.Bin Zhang ... Fuji Ren
01 Jun 2016
01 Jun 2016

Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator
A Rios-Navarro ... T Delbruck
-
A Rios-Navarro, et. al.A Rios-Navarro ... T Delbruck
01 Jul 2018
01 Jul 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NEURA ghe

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ACM Transactions on Reconfigurable Technology and Systems