Fixed Point Implementation of Tiny-Yolo-v2 using OpenCL on FPGA

Yap June Wai,Zulkalnain Bin,Lim Kim,Sani Irwan

doi:10.14569/ijacsa.2018.091062

Yap June Wai, Zulkalnain Bin + Show 2 more

Open Access

https://doi.org/10.14569/ijacsa.2018.091062

Copy DOI

Abstract

Deep Convolutional Neural Network (CNN) algorithm has recently gained popularity in many applications such as image classification, video analytic and object detection. Being compute-intensive and memory expensive, CNN-based algorithms are hard to be implemented on the embedded device. Although recent studies have explored the hardware implementation of CNN-based object classification models such as AlexNet and VGG, there is still a rare implementation of CNN-based object detection model on Field Programmable Gate Array (FPGA). Consequently, this study proposes the fixed-point (16-bit) implementation of CNN-based object detection model: Tiny-Yolo-v2 on Cyclone V PCIe Development Kit FPGA board using High-Level-Synthesis (HLS) tool: OpenCL. Considering FPGA resource constraints in term of computational resources, memory bandwidth, and on-chip memory, a data pre-processing approach is proposed to merge the batch normalization into convolution layer. To the best of our knowledge, this is the first implementation of Tiny-Yolo-v2 object detection algorithm on FPGA using Intel FPGA Software Development Kit (SDK) for OpenCL. Finally, the proposed implementation achieves a peak performance of 21 GOPs under 100 MHz working frequency.

Highlights

Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the artificial neural network
In the OpenCL framework, the Central Processing Unit (CPU) acts as the host and it has bridges interconnect the Cyclone V PCIe Field Programmable Gate Array (FPGA) board which it serves as an OpenCL device, forming a heterogeneous computing system
The proposed design is compared to software implementation (CPU) with the two scalable design parameters BLOCK_SIZE=32 and Single Instruction Multiple Data (SIMD)=4

Summary

Introduction

Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the artificial neural network. The state-of-the-art of CNN algorithms usually require millions of parameters and billions of operations to process a single image input This is a great challenge to implement CNN algorithms on an embedded system due to severe hardware constraints such as computational resources, memory bandwidth, and on-chip memory. In recent year, Field Programmable Gate Array (FPGA) has become an attractive alternative solution to accelerate CNN-based algorithms due to its relatively high performance, flexibility, energy efficient and fast development cycle, especially with the new release of High-Level-Synthesis (HLS) tool: OpenCL. It greatly reduces the complexity of programming by enabling the auto-compilation from a highlevel program (C/C++) to register-transfer-level (RTL). On the host side, C/C++ code runs on the CPU, providing vendor specific Application Programming Interface (API) to communicate with the implemented kernels on the Cyclone V PCIe FPGA board

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2018
Citations: 60	License type: cc-by

R Discovery Prime

R Discovery Prime

Fixed Point Implementation of Tiny-Yolo-v2 using OpenCL on FPGA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Efficient Hardware Optimization for CNN
Seda Güzel Aydın ... Hasan Şakir Bilge
International Journal of Multidisciplinary Studies and Innovative Technologies | VOL. 6
Seda Güzel Aydın, et. al.Seda Güzel Aydın ... Hasan Şakir Bilge
01 Jan 2021
International Journal of Multidisciplinary Studies and Innovative Technologies | VOL. 6

FFConv
Afzal Ahmad ... Muhammad Adeel Pasha
ACM Transactions on Embedded Computing Systems | VOL. 19
Afzal Ahmad, et. al.Afzal Ahmad ... Muhammad Adeel Pasha
11 Mar 2020
ACM Transactions on Embedded Computing Systems | VOL. 19

CPU-Accelerator Co-Scheduling for CNN Acceleration at the Edge
Yeongmin Kim ... Arslan Munir
IEEE Access | VOL. 8
Yeongmin Kim, et. al.Yeongmin Kim ... Arslan Munir
01 Jan 2020
IEEE Access | VOL. 8

A novel hardware-oriented ultra-high-speed object detection algorithm based on convolutional neural network
Jianquan Li ... De Xu
Journal of Real-Time Image Processing | VOL. 17
Jianquan Li, et. al.Jianquan Li ... De Xu
21 Dec 2019
Journal of Real-Time Image Processing | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fixed Point Implementation of Tiny-Yolo-v2 using OpenCL on FPGA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications