Abstract

convolutional neural networks (CNNs) in the frequency domain is of great significance for extending the deep learning principle to the frequency domain. However, the frequency domain representation of the convnet architecture is highly demanding due to their complicated Fourier domain training features. Therefore, high accuracy and unambiguous representation strategies are needed for training convolutional neural networks entirely in the Fourier domain. Being founded on the bin decomposition mechanism and the non-saturated activation theory, this paper proposes an accurate, stable and efficient Fourier domain training framework for convolutional neural networks. The framework contains two important Fourier domain representations: one is the Fourier domain exponential linear unit, and the other is the pyramid pooling layer. The former alleviates the vanishing phenomenon and makes CNNs easier to converge in the Fourier domain; the latter avoids the original cropping or warping steps and improves the classification accuracy. With the framework, the Fourier domain training accuracy is improved without sacrificing the throughput of the graphic processing unit (GPU). With the Re-50 as the backbone, the top-1 and top-5 classification errors are reduced from 28.85 and 9.55 to 18.63 and 4.05, respectively, while the speedup ratios of the framework can reach up to 4.9877 and 1.8997, respectively, at a batch size of 128 on an NVIDIA GEFORCE RTX 2080 GPU (8.92 TFLOPS). The average difference between the classification value and the ground truth value is only 0.21 on the MetaGram-1 set, which indicates great goodness-of-fit and robustness of the framework. This investigation illustrates that the proposed Fourier domain CNN framework using the sophisticated Fourier domain representation strategy is highly efficient and accurate. Therefore, it may serve as a baseline framework to establish the training pipelines for Fourier domain CNNs, which can improve the deep learning accuracy of CNNs and extend the Fourier domain representation strategy to other deep learning networks.

Highlights

  • The convolutional neural network is a significant deep learning framework for image classification, object detection, natural language processing, etc. [1]–[6]

  • An unsaturated Fourier domain activation function is of great significance in alleviating the gradient decreasing problem and making it easier to converge in the frequency domain training phase [39]–[43]

  • Because the CUDA [47], [48] and deep learning framework (e.g., Caffe [49]) are run using fixed-sized inputs, we incorporate the spatial domain pyramid pooling method [44] to establish our training solution for the inputs with arbitrary sizes which ensures that our Fourier domain pyramid pooling layer is trained and tested under the CUDA and Caffe implementations

Read more

Summary

INTRODUCTION

The convolutional neural network is a significant deep learning framework for image classification, object detection, natural language processing, etc. [1]–[6]. Jong Hwan Ko et al proposed an energy-efficient accelerator for the CNN using Fourier domain computations (abbreviated as koCNN) [34], in which the spectral pooling strategy [35] and the discrete sync interpolation operation are two important methods for Fourier domain training. The koCNN provides inaccurate results since it is difficult for the spectral pooling strategy to transmit the incomplete kernel spectrum to the previous neurons, and the tanh and sigmoid operations that are employed by koCNN are not unsaturated activation operations that decrease the precision of the weights in the back propagation In this case, accuracy often has to be sacrificed to achieve low computational complexity. The proposed architecture achieves the Fourier domain training and testing of networks on basis of the bin decomposition mechanism and non-saturated activation functions instead of sacrificing the classification accuracy or depending on the time-consuming time-frequency transformation strategy.

OVERALL FRAMEWORK
FOURIER DOMAIN BACKWARD
DATASETS AND TRAINING CONFIGURATION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call