Abstract

Customized hardware based convolutional neural network ( CNN or ConvNet ) accelerators have attracted significant attention for applications in a low-cost, edge computing system. However, there is a lack of research that seeks to optimize at both the algorithm and hardware levels simultaneously in resource-constrained FPGA systems. In this paper, we first analyze ConvNet models to find one that is most suitable for a low-cost FPGA implementation. Based on the analysis, we select MobileNetV2 as the backbone of our research due to its hardware-friendly structure. We use a quantized implementation with 4-bit precision and optimize further with a smaller input resolution of 192 × 192 to obtain a 68.8% detection accuracy on ImageNet, which represents only a 3.2% accuracy loss compared to a floating-point model that uses the full input size. We then develop a hardware implementation that uses a low-cost FPGA. To accelerate the depth-wise separable ConvNet and utilize DRAM resources efficiently with parallel processing, we propose a novel scoreboard architecture to dynamically schedule DRAM data requests in order to maintain a high hardware utilization. The number of DSP blocks used is about six times smaller than in prior work. In addition, internal block RAM utilization is approximately nine times more efficient than in prior work. Our proposed design achieves 3.07 frames per second (FPS) on the low-cost and resource constrained FPGA system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call