Abstract

The advent of deep learning has revolutionized the domain of computer vision. Convolutional neural networks (CNNs) became state-of-the-art for solving complex tasks thanks to technological advances of high-end accelerators, such as GPUs and FPGAs, combined in clusters or cloud solutions. In embedded systems, CNNs are also of great interest. However, often these devices cannot afford to offload computational-intensive workloads to the cloud due to strict energy or real-time constraints. Tightly Coupled Processor Arrays (TCPAs) are ideal architectures for accelerating nested loop programs at high energy efficiency. In this demonstrator, we show how TCPAs can meet these requirements at the edge of computing. For illustration, we designed a CNN-based hand sign recognition which is accelerated on a TCPA, implemented the TCPA prototypically as an overlay on a Xilinx Zynq System-on-a-Chip (SoC), and showcase tremendous speedups compared with the integrated ARM Cortex-A9 processor.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call