A Low-complexity Complex-valued Activation Function for Fast and Accurate Spectral Domain Convolutional Neural Network

Shahriyar Masud Rizvi,Mohamed Khalil-Hani,Ab Al-Hadi Ab Rahman,Sayed Omid Ayat

doi:10.52549/ijeei.v9i1.2737

Abstract

Conventional Convolutional Neural Networks (CNNs), which are realized in spatial domain, exhibit high computational complexity. This results in high resource utilization and memory usage and makes them unsuitable for implementation in resource and energy-constrained embedded systems. A promising approach for low-complexity and high-speed solution is to apply CNN modeled in the spectral domain. One of the main challenges in this approach is the design of activation functions. Some of the proposed solutions perform activation functions in spatial domain, necessitating multiple and computationally expensive spatial-spectral domain switching. On the other hand, recent work on spectral activation functions resulted in very computationally intensive solutions. This paper proposes a complex-valued activation function for spectral domain CNNs that only transmits input values that have positive-valued real or imaginary component. This activation function is computationally inexpensive in both forward and backward propagation and provides sufficient nonlinearity that ensures high classification accuracy. We apply this complex-valued activation function in a LeNet-5 architecture and achieve an accuracy gain of up to 7% for MNIST and 6% for Fashion MNIST dataset, while providing up to 79% and 85% faster inference times, respectively, over state-of-the-art activation functions for spectral domain.

Highlights

An artificial neural network (ANN) is based on a collection of connected nodes called neurons
We have developed a spectral domain Convolutional Neural Networks (CNNs) model for LeNet-5, where spatial convolution and pooling are replaced with point-wise product and spectral pooling and proposed complex-valued activation function is employed in place of sigmoid or Rectified Linear Unit (ReLU) activation functions in the original model
We have applied all these activation functions in our CNN model and tested them on the same MNIST and Fashion MNIST datasets so that a fair comparison can be made between this work and related previous works

Summary

Introduction

An artificial neural network (ANN) is based on a collection of connected nodes called neurons. ANNs must have an input layer, one or more hidden layers and an output layer. The output of each neuron is computed (activated) by some non-linear function of the sum of its inputs. It is highly desirable that the activation functions are non-linear and differentiable. The hidden layers should employ nonlinear activation functions so as to enable the network to learn complex relationships from the input data [1]. Through nonlinear activation functions an ANN can learn any nonlinear behavior, provided the network has enough neurons and layers [2]

Methods

Results

Conclusion