A Frequency-Domain Convolutional Neural Network Architecture Based on the Frequency-Domain Randomized Offset Rectified Linear Unit and Frequency-Domain Chunk Max Pooling Method

Jinhua Lin,Jingxia Cui,Lin Ma

doi:10.1109/access.2020.2996250

Abstract

It is of great importance to construct a convolutional neural network architecture in the frequency domain to explore the theory of deep learning in the frequency domain. However, due to the complexity of the construction mechanism of the forward and backward pipelines needed to train the convolutional neural network in the frequency domain, higher requirements are put forward for the representation strategy of the frequency-domain activation function and pooling method in the forward and backward pipelines. Therefore, to construct a full frequency-domain convolutional neural network architecture, it is necessary to construct a frequency-domain representation strategy with a high classification accuracy and excellent time performance. In this paper, based on a chunk decomposition mechanism and the construction principle of the frequency-domain unsaturated activation function, a frequency-domain convolutional neural network architecture is proposed. Two important representation strategies are introduced into the frequency-domain forward/backward pipeline: a frequency-domain randomized offset rectified linear unit and a frequency-domain chunk max pooling method. The former can alleviate the vanishing and exploding gradient phenomena in the frequency-domain forward/backward pipeline and ensure the convergence of the convolutional neural network architecture in the frequency-domain training stage; the latter can capture the partial location information and characteristic strength of the frequency-domain neurons and improve the classification performance of the convolutional neural network in the frequency domain. This full frequency-domain convolutional neural network architecture improves the training accuracy of the convolutional neural network in the frequency-domain pipeline. The results show that on the basis of ResNet-50 as the backbone framework, an NVIDIA GeForce CUDA(Compute Unified Device Architecture) as the training pipeline, and $4\times 4$ as the activation block size of the third-level output neuron's characteristic parameter matrix, the convolutional neural network architecture proposed in this paper can lower the top-1 error from 24.90% to 17.95%, the top-5 error from 12.85% to 9.23%. Furthermore, when the batch size is equal to 128 (in the worst-case bandwidth usage scenario), the acceleration ratio of the proposed architecture can still reach 13.0375 by selecting cuDNN as the reference model. Under the same backbone framework, the proposed architecture is tested on the MetData-1 dataset, and the classification accuracy can reach the maximum value; that is, the average difference is merely 0.18. This finding shows that the proposed architecture can improve the accuracy of the deep learning-based frequency-domain convolutional neural network model without reducing the time performance and expand the frequency-domain representation strategy of the frequency-domain activation function and pooling method.

Highlights

As an important deep learning framework, convolutional neural networks are widely used in many artificial intelligence fields, such as object classification, speech recognition, target tracking and automated driving [1]–[9]
Lin et al.: frequency-domain convolutional neural network model (FCNN) Architecture Based on the frequency-domain randomized offset rectified linear unit (FRReLU) and frequency-domain chunk max pooling method (Fcmp)
Lin et al.: FCNN Architecture Based on the FRReLU and Fcmp TABLE 1

Summary

INTRODUCTION

As an important deep learning framework, convolutional neural networks are widely used in many artificial intelligence fields, such as object classification, speech recognition, target tracking and automated driving [1]–[9]. We propose a frequency-domain randomized offset rectified linear unit and a frequency-domain chunk max pooling operation Based on the study of the chunk decomposition mechanism and the construction principle of the frequency-domain unsaturated activation function, the frequency-domain training process of the convolutional neural network is realized without reducing the classification precision or excessively relying on the fast Fourier transform method. We call the full frequency-domain architecture FCNN (frequency-domain convolutional neural network) for short; in the third section, the frequency-domain forward pipeline structure of FCNN is given, and the frequencydomain randomized offset rectified linear unit (FRReLU) and the frequency-domain chunk max pooling method (Fcmp) are introduced.

OVERALL FRAMEWORK

RESULTS AND DISCUSSION

CONCLUSION