Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC

Antonio Rios-Navarro,Daniel Gutierrez-Galan,Lourdes Duran-Lopez,Juan Pedro Dominguez-Morales,Enrique Piñero-Fuentes,Ricardo Tapiador-Morales,Manuel Jesús Dominguez-Morales

doi:10.3390/electronics10010094

Antonio Rios-Navarro, Daniel Gutierrez-Galan + Show 5 more

Open Access

https://doi.org/10.3390/electronics10010094

Copy DOI

Journal: Electronics	Publication Date: Jan 5, 2021
Citations: 4	License type: CC BY 4.0

Affiliation: Universidad de Sevilla

Abstract

The use of deep learning solutions in different disciplines is increasing and their algorithms are computationally expensive in most cases. For this reason, numerous hardware accelerators have appeared to compute their operations efficiently in parallel, achieving higher performance and lower latency. These algorithms need large amounts of data to feed each of their computing layers, which makes it necessary to efficiently handle the data transfers that feed and collect the information to and from the accelerators. For the implementation of these accelerators, hybrid devices are widely used, which have an embedded computer, where an operating system can be run, and a field-programmable gate array (FPGA), where the accelerator can be deployed. In this work, we present a software API that efficiently organizes the memory, preventing reallocating data from one memory area to another, which improves the native Linux driver with a 85% speed-up and reduces the frame computing time by 28% in a real application.

Highlights

Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) being the most widely used; the former is typically used for image processing, while the latter is mostly used for time-dependent signals
The first test, loopback, was an implementation where the MM2S output and S2MM input channels were connected at the Programmable Logic (PL) side
This connection allowed data sent from Processing System (PS) to PL to be sent back from PL to PS without any modification

Summary

Introduction

Deep Learning (DL) has grown by leaps and bounds for several years, and currently offers solutions to problems in many scientific fields [1], such as computer vision, where there are DL algorithms that even outperform human performance [2], in natural language processing, where DL is used to recognize spoken words and phrases [3], in robotics, where it is used for robot navigation, grasping and object manipulation [4], and in control theory, where DL is used to design system controllers [5] These kind of algorithms can and must be trained in order to solve a specific task and, they have achieved very good results, in some cases improving upon the performance achieved by humans [6,7].

PSoC System Architecture

BiMapTab API

Axidma

Results

Loopback Test

Accelerator Integration Test

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Hardware implementation of digital image skeletonization algorithm using FPGA for computer vision applications
Perumalla Srinivasa Rao ... Kamatham Yedukondalu
Journal of Visual Communication and Image Representation | VOL. 59
Perumalla Srinivasa Rao, et. al.Perumalla Srinivasa Rao ... Kamatham Yedukondalu
04 Jan 2019
Journal of Visual Communication and Image Representation | VOL. 59

FPGA Implementation for Skeletonization of 2-D Images
Srinivasa Rao Perumalla ... Yedukondalu Kamatham
-
Srinivasa Rao Perumalla, et. al.Srinivasa Rao Perumalla ... Yedukondalu Kamatham
01 May 2018
01 May 2018

On Reducing Power during Test Process of FPGAs
A Ahmad
-
A AhmadA Ahmad
13 Oct 2022
13 Oct 2022

OpenCL Implementation of FPGA-Based Signal Generation and Measurement
Iman Firmansyah ... Yoshiki Yamaguchi
IEEE Access | VOL. 7
Iman Firmansyah, et. al.Iman Firmansyah ... Yoshiki Yamaguchi
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics