A multistage dataflow implementation of a Deep Convolutional Neural Network based on FPGA for high-speed object recognition

Ning Li,Yoichi Tomiokay,Shunpei Takaki,Hitoshi Kitazawa

doi:10.1109/ssiai.2016.7459201

Abstract

Deep Neural Networks (DNNs) have progressed significantly in recent years. Novel DNN methods allow tasks such as image and speech recognition to be conducted easily and efficiently, compared with previous methods that needed to search for valid feature values or algorithms. However, DNN computations typically consume a significant amount of time and high-performance computing resources. To facilitate high-speed object recognition, this article introduces a Deep Convolutional Neural Network (DCNN) accelerator based on a field-programmable gate array (FPGA). Our hardware takes full advantage of the characteristics of convolutional calculation; this allowed us to implement all DCNN layers, from image input to classification, in a single chip. In particular, the dateflow from input to classification is uninterrupted and paralleled. As a result, our implementation achieved a speed of 409.62 giga-operations per second (GOPS), which is approximately twice as fast as the latest reported result. Furthermore, we used the same architecture to implement a Recurrent Convolutional Neural Network (RCNN), which can, in theory, provide better recognition accuracy.

Full Text