Abstract

The use of deep learning solutions in different disciplines is increasing and their algorithms are computationally expensive in most cases. For this reason, numerous hardware accelerators have appeared to compute their operations efficiently in parallel, achieving higher performance and lower latency. These algorithms need large amounts of data to feed each of their computing layers, which makes it necessary to efficiently handle the data transfers that feed and collect the information to and from the accelerators. For the implementation of these accelerators, hybrid devices are widely used, which have an embedded computer, where an operating system can be run, and a field-programmable gate array (FPGA), where the accelerator can be deployed. In this work, we present a software API that efficiently organizes the memory, preventing reallocating data from one memory area to another, which improves the native Linux driver with a 85% speed-up and reduces the frame computing time by 28% in a real application.

Highlights

  • Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) being the most widely used; the former is typically used for image processing, while the latter is mostly used for time-dependent signals

  • The first test, loopback, was an implementation where the MM2S output and S2MM input channels were connected at the Programmable Logic (PL) side

  • This connection allowed data sent from Processing System (PS) to PL to be sent back from PL to PS without any modification

Read more

Summary

Introduction

Deep Learning (DL) has grown by leaps and bounds for several years, and currently offers solutions to problems in many scientific fields [1], such as computer vision, where there are DL algorithms that even outperform human performance [2], in natural language processing, where DL is used to recognize spoken words and phrases [3], in robotics, where it is used for robot navigation, grasping and object manipulation [4], and in control theory, where DL is used to design system controllers [5] These kind of algorithms can and must be trained in order to solve a specific task and, they have achieved very good results, in some cases improving upon the performance achieved by humans [6,7].

PSoC System Architecture
BiMapTab API
Axidma
Results
Loopback Test
Accelerator Integration Test
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.