MNet2FPGA: A Design Flow for Mapping a Fixed-Point CNN to Zynq SoC FPGA

Tomyslav Sledevič,Artūras Serackis

doi:10.3390/electronics9111823

Tomyslav Sledevič, Artūras Serackis

Open Access

https://doi.org/10.3390/electronics9111823

Copy DOI

Journal: Electronics	Publication Date: Nov 2, 2020
Citations: 11	License type: CC BY 4.0

Affiliation: Vilnius Gediminas Technical University

Abstract

The convolutional neural networks (CNNs) are a computation and memory demanding class of deep neural networks. The field-programmable gate arrays (FPGAs) are often used to accelerate the networks deployed in embedded platforms due to the high computational complexity of CNNs. In most cases, the CNNs are trained with existing deep learning frameworks and then mapped to FPGAs with specialized toolflows. In this paper, we propose a CNN core architecture called mNet2FPGA that places a trained CNN on a SoC FPGA. The processing system (PS) is responsible for convolution and fully connected core configuration according to the list of prescheduled instructions. The programmable logic holds cores of convolution and fully connected layers. The hardware architecture is based on the advanced extensible interface (AXI) stream processing with simultaneous bidirectional transfers between RAM and the CNN core. The core was tested on a cost-optimized Z-7020 FPGA with 16-bit fixed-point VGG networks. The kernel binarization and merging with the batch normalization layer were applied to reduce the number of DSPs in the multi-channel convolutional core. The convolutional core processes eight input feature maps at once and generates eight output channels of the same size and composition at 50 MHz. The core of the fully connected (FC) layer works at 100 MHz with up to 4096 neurons per layer. In a current version of the CNN core, the size of the convolutional kernel is fixed to 3×3. The estimated average performance is 8.6 GOPS for VGG13 and near 8.4 GOPS for VGG16/19 networks.

Highlights

The convolutional neural networks (CNNs) have widely spread out in the last decade
The VGG networks were selected for the implementation since they contain 3 × 3 size kernels suitable to run on the proposed CNN core
We proposed an approach to map fixed-point CNNs to low-density SoC field-programmable gate arrays (FPGAs)

Summary

Introduction

The convolutional neural networks (CNNs) have widely spread out in the last decade. They are most commonly used to solve machine vision tasks [1,2], or even for the identification of dynamic systems [3]. The training of deep neural networks is commonly performed using floating-point arithmetic operations. Most of the deep learning systems are designed to be trained and executed on systems with graphics processing units. Such devices have up to several thousand computing cores and large external memory bandwidth. CNNs are a class of deep learning networks generally used for image classification.

Methods

Results

Discussion

Conclusion