In the field of deep learning, Convolutional Neural Networks (CNN) has become a focal point due to its multi-layered structure and wide application. The success of deep learning is due to the model has more layers and more parameters, which gives it a stronger nonlinear fitting ability. Traditionally, CNNs are primarily run on Central Processing Units (CPUs) and Graphics Processing Units (GPUs). However, CPUs have lower computational power, and GPUs consume a lot of energy. In contrast, Field-Programmable Gate Arrays (FPGAs) offer high parallelism, low power consumption, flexible programming, and rapid development cycles. These combined advantages make FPGAs more suitable for the forward inference processes of deep learning compared to other platforms. However, CNNs has the characteristics of parameter redundancy, the storage cost of deploying it on FPGA is too high. In order to apply CNNs to FPGA, we need to optimize CNNs for compression. Because after compression, This paper analyzes several specific cases of model compression for convolutional neural networks, and summarizes and compares the efficient methods of model compression. The results show that the model compression methods are mainly divided into the structure change of convolutional neural network and the quantization of parameters. The two compression methods can be cascaded at the same time to achieve a better optimization effect.
Read full abstract