Novel CNN-Based AP2D-Net Accelerator: An Area and Power Efficient Solution for Real-Time Applications on Mobile FPGA

Shuai Li,Yukui Luo,Nandakishor Yadav,Kuangyuan Sun,Ken Choi

doi:10.3390/electronics9050832

Abstract

Standard convolutional neural networks (CNNs) have large amounts of data redundancy, and the same accuracy can be obtained even in lower bit weights instead of floating-point representation. Most CNNs have to be developed and executed on high-end GPU-based workstations, for which it is hard to transplant the existing implementations onto portable edge FPGAs because of the limitation of on-chip block memory storage size and battery capacity. In this paper, we present adaptive pointwise convolution and 2D convolution joint network (AP2D-Net), an ultra-low power and relatively high throughput system combined with dynamic precision weights and activation. Our system has high performance, and we make a trade-off between accuracy and power efficiency by adopting unmanned aerial vehicle (UAV) object detection scenarios. We evaluate our system on the Zynq UltraScale+ MPSoC Ultra96 mobile FPGA platform. The target board can get the real-time speed of 30 fps under 5.6 W, and the FPGA on-chip power is only 0.6 W. The power efficiency of our system is 2.8× better than the best system design on a Jetson TX2 GPU and 1.9× better than the design on a PYNQ-Z1 SoC FPGA.

Highlights

Convolutional neural network (CNN)-based deep learning (DL) algorithms are widely used in autonomous driving, natural language processing, web recommendation systems, etc., which greatly improve the quality of life of modern society
It takes the order of giga floating-point operations (GFLOP) to process a single image, which is far beyond the computational ability of the central processing unit (CPU) and hard to process in real-time
We developed the unmanned aerial vehicle (UAV) object detection system for real-time, high accuracy, and low power application combined with register-transfer level (RTL) intellectual property (IP) such as direct memory access (DMA), AXI4-stream, and digital signal processors (DSPs) to design our CNN

Summary

Introduction

Convolutional neural network (CNN)-based deep learning (DL) algorithms are widely used in autonomous driving, natural language processing, web recommendation systems, etc., which greatly improve the quality of life of modern society. For more intricate tasks, the number of CNN model parameters grows exponentially It takes the order of giga floating-point operations (GFLOP) to process a single image, which is far beyond the computational ability of the central processing unit (CPU) and hard to process in real-time. To overcome those compute-intensive tasks, researchers leverage the advantages of the graphics processing unit (GPU), such as high bandwidth and thread parallelism. The bandwidth and on-chip memory of the FPGA are limited compared with the modern GPU The design challenges such as low bandwidth and limited cache size make it hard to work in real-time.

Related Work

Optimization of the Computational Kernels

Bandwidth Optimization to Improve Throughput

Model Optimization

Binary Neural Networks

Implementation Methodologies

Proposed System Architecture and IP Block Design

AP2D-Net Modeling of the CNN-Based FPGA Accelerator

Structure of AP2D-Net

Feature Extraction

Classification and Regression

AP2D-NET System Design on FPGA

Overall Architecture of the AP2D-Net Accelerator

Optimization on a Heterogeneous System

Dataset

Training

Evaluation Criteria

AP2D-Net Modeling

Trade-Off between Working Frequency and Energy Consumption

Hardware Usage on FPGA

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: May 18, 2020
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Novel CNN-Based AP2D-Net Accelerator: An Area and Power Efficient Solution for Real-Time Applications on Mobile FPGA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Weed species classification with UAV imagery and standard CNN models: Assessing the frontiers of training and inference phases
G.A Mesías-Ruiz ... J Dorado
Crop Protection | VOL. 182
G.A Mesías-Ruiz, et. al.G.A Mesías-Ruiz ... J Dorado
11 May 2024
Crop Protection | VOL. 182

Madhubani Art Classification using transfer learning with deep feature fusion and decision fusion based techniques
Seema Varshney ... C Patvardhan
Engineering Applications of Artificial Intelligence | VOL. 119
Seema Varshney, et. al.Seema Varshney ... C Patvardhan
21 Dec 2022
Engineering Applications of Artificial Intelligence | VOL. 119

Patch individual filter layers in CNNs to harness the spatial homogeneity of neuroimaging data
Fabian Eitel ... Martin Weygandt
Scientific Reports | VOL. 11
Fabian Eitel, et. al.Fabian Eitel ... Martin Weygandt
01 Dec 2021
Scientific Reports | VOL. 11

Convolutional SVM Networks for Object Detection in UAV Imagery
Yakoub Bazi ... Farid Melgani
IEEE Transactions on Geoscience and Remote Sensing | VOL. 56
Yakoub Bazi, et. al.Yakoub Bazi ... Farid Melgani
01 Jun 2018
IEEE Transactions on Geoscience and Remote Sensing | VOL. 56

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Novel CNN-Based AP2D-Net Accelerator: An Area and Power Efficient Solution for Real-Time Applications on Mobile FPGA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics