Convolutional neural networks (CNNs) have become a primary approach in the field of artificial intelligence (AI), with wide range of applications. The two computational phases for every neural network are; the training phase and the testing phase. Usually, testing is performed on high-processing hardware engines, however, the training part is still a challenge for low-power devices. There are several neural accelerators; such as graphics processing units and field-programmable-gate-arrays (FPGAs). From the design perspective, an efficient hardware engine at the register-transfer level and efficient CNN modeling at the TensorFlow level are mandatory for any type of application. Hence, we propose a comprehensive, and step-by-step design procedure for a re-configurable CNN engine. We used TensorFlow and Keras libraries for modeling in Python, whereas the register-transfer-level part was performed using Verilog. The proposed idea was synthesized, placed, and routed for 180 nm complementary metal-oxide semiconductor technology using synopsis design compiler tools. The proposed design layout occupies an area of 3.16 × 3.16 mm2. A competitive accuracy of approximately 96% was achieved for the Modified National Institute of Standards and Technology (MNIST) and Canadian Institute for Advanced Research (CIFAR-10) datasets.
Read full abstract