Abstract

Deep Neural Networks (DNNs) have progressed significantly in recent years. Novel DNN methods allow tasks such as image and speech recognition to be conducted easily and efficiently, compared with previous methods that needed to search for valid feature values or algorithms. However, DNN computations typically consume a significant amount of time and high-performance computing resources. To facilitate high-speed object recognition, this article introduces a Deep Convolutional Neural Network (DCNN) accelerator based on a field-programmable gate array (FPGA). Our hardware takes full advantage of the characteristics of convolutional calculation; this allowed us to implement all DCNN layers, from image input to classification, in a single chip. In particular, the dateflow from input to classification is uninterrupted and paralleled. As a result, our implementation achieved a speed of 409.62 giga-operations per second (GOPS), which is approximately twice as fast as the latest reported result. Furthermore, we used the same architecture to implement a Recurrent Convolutional Neural Network (RCNN), which can, in theory, provide better recognition accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call